DataWorks allows you to associate compute engines with a workspace. After you associate a compute engine with a workspace, you can use the compute engine as a compute engine instance of the workspace to periodically run nodes of the same compute engine type to develop data in DataWorks. This topic describes how to associate compute engines with a DataWorks workspace and manage the compute engines.
Prerequisites
You have an understanding of the physical architectures of workspaces in basic and standard modes and the impacts of the two modes on node development. For more information, see Differences between workspaces in basic mode and workspaces in standard mode.Background information
Before you associate a compute engine with a DataWorks workspace, you must be familiar with the relationships between compute engine environments and DataWorks service modules in the workspace. This way, you can determine the compute engine such as a database or a cluster that you can use in the corresponding environment. In SettingCenter, you can view the information about a compute engine that you associate with a workspace on the Computing engine information tab of the Workspace page in SettingCenter.Precautions (important)
- Background information
- A workspace in standard mode has two environments: development and production environments. If you use a workspace in standard mode, different compute engines can be used to run nodes in the environments. However, only nodes in the production environment can be automatically scheduled. You can use only one identity to schedule nodes, and the account that assumes the identity must have high permissions on the compute engine.
- A workspace in basic mode has only the production environment. If you use a workspace in basic mode, a single compute engine can be used for both tests in DataStudio and node scheduling in the production environment. You can specify only one account or node owner to run nodes and perform operations on production data.
- Confirmation items
Before you associate a compute engine with a workspace, you must confirm the items described in the following table.
Category Item Description Workspace mode and compute engines that you need to associate with the workspace in different environments Confirm that you want to associate different clusters, databases, or instances that are physically isolated with a workspace in standard mode in the production and production environments. This item determines whether the code and resources in the development environment can be isolated from the code and resources in the production environment. Plan and confirm the names of the compute engines that you associate with the workspace in the development and production environments. This item determines how you query the table data in the corresponding environment. Examples: - You can determine the compute engines from which you read data or to which you write data when you test a node on the DataStudio page and when the node is automatically scheduled in Operation Center in the production environment.
- You can query the SQL statements or data storage path that a compute engine uses.
Identity that you use to access data and the permissions of the identity Plan and confirm the accounts that you use in the production and development environments. This item is related to the data security and permission management in subsequent data development. - A workspace in standard mode has two environments: development and production environments. You must confirm the account that you use to test nodes in DataStudio in the development environment and the account that you use to schedule nodes in Operation Center in the production environment.
- A workspace in basic mode has only the production environment. You must confirm the account that you use to test nodes in DataStudio and schedule nodes in Operation Center in the production environment.
Grant the account that you use in the production environment high permissions on the compute engine or the permissions that are required to run nodes. This item determines whether nodes fail to run because the account that you select for the production environment has insufficient permissions. Note By default, the account that is used to schedule nodes in the production environment is an Alibaba Cloud account. If you change the Alibaba Cloud account to a RAM user, nodes may fail to run because the RAM user has insufficient permissions. For example, nodes may successfully run on the DataStudio page but fail to run in the production environment because the RAM user has no permissions on tables.
Limits
- The maximum number of compute engines that can be associated with a workspace varies based on the DataWorks edition. For more information, see Differences among DataWorks editions.
- A compute engine can be associated with only one DataWorks workspace.
- Only the workspace administrator can associate a compute engine with a workspace or disassociate a compute engine from a DataWorks workspace. Some compute engines require the workspace administrator to have other related permissions. For more information, see the topics about how to associate a specific type of compute engine with a workspace.
- If you associate different instances, projects, or databases with a workspace in the development and production environments, the features of the corresponding compute engine determine whether you can access objects, such as tables, resources, and functions in the instance, project, or database, in the production environment from the development environment. Examples:
- MaxCompute supports access to tables across projects. If you associate two MaxCompute projects with a workspace in standard mode to isolate data in the development environment from data in the production environment, you can query table data in the production environment on the DataStudio page.
- Hologres does not support access to tables across databases. If you associate two Hologres databases with a workspace in standard mode to isolate data in the development environment from data in the production environment, you cannot query table data in the production environment on the DataStudio page.
Go to the Workspace Management page
- Method 1: In the DataWorks console, find the desired workspace, move the pointer over the
icon in the Actions column, and then select Workspace Settings. On the Workspace page, click the Computing engine information tab.
- Method 2: Find the desired workspace and click the name of a DataWorks module in the Actions column, such as DataStudio or Data Integration. In the top navigation bar of the page that appears, click the
icon. On the Workspace page, click the Computing engine information tab.
Associate a compute engine with a DataWorks workspace
You can associate a compute engine with a DataWorks workspace when you create the workspace or on the Workspace Management page.
- On the Computing engine information tab, click the tab of the desired compute engine and click Add instance.
- On the dialog box that appears, configure the parameters based on your business requirements. The following topics describe the parameters that you must configure for different types of compute engines:
- Associate a MaxCompute compute engine with a workspace
- Associate an EMR compute engine with a workspace
- Associate a Hologres compute engine with a workspace
- Associate an AnalyticDB for PostgreSQL compute engine with a workspace
- Associate an AnalyticDB for MySQL compute engine with a workspace
- Associate a CDH compute engine with a workspace
- Associate a ClickHouse compute engine with a workspace
Set a compute engine instance as the default compute engine instance
If a workspace has multiple compute engine instances of the same compute engine type, you can perform the following operations to set one of the instances as the default compute engine instance for data development: In the Compute Engine Information section of the Workspace Management page, find the desired compute engine instance and click Set as default instance in the upper-right corner.
Disassociate a compute engine from a DataWorks workspace
- Evaluate the impacts.After you disassociate or delete a compute engine from a workspace, you cannot develop data based on the compute engine, and related nodes fail to run. The following section describes the possible impacts of the operation:
- Node scheduling: Nodes that are run based on the compute engine fail to run.
- Data Integration: The synchronization nodes that are run based on the compute engine fail to run. We recommend that you replace the compute engine with another data source for the synchronization nodes on the DataStudio page.
- DataService Studio: The APIs that are related to the compute engine fail to be called. We recommend that you replace the compute engine with another data source for the APIs.
- DataAnalysis: You cannot query data that is related to the compute engine. We recommend that you replace the compute engine with another data source in DataAnalysis.
- The information about the compute engine is not displayed in the Data Map, Resource Optimization, Comprehensive Data Governance, and Security Center modules.
- Disassociate a compute engine from a DataWorks workspaceOn the Computing engine information tab of the Workspace page in SettingCenter, find the desired compute engine instance and click Delete or Unbind in the upper-right corner.Note This operation only disassociates the compute engine from the workspace and does not delete the compute engine from the console of the compute engine. After you disassociate a compute engine from a DataWorks workspace, you can still go to the console of the compute engine to view and manage the compute engine.