All Products
Search
Document Center

DataWorks:Differences between workspaces in basic mode and workspaces in standard mode

Last Updated:Oct 12, 2023

DataWorks provides workspaces in basic and standard modes. This allows you to develop data based on different security control requirements. This topic describes the differences between workspaces in basic mode and workspaces in standard mode from various aspects, including the differences in physical architectures and impacts on node development.

Background information

This topic consists of the sections that are described in the following table.

Section

Description

Workspaces in basic and standard modes

Describes the physical architectures of workspaces in basic and standard modes.

Impacts of different workspace modes on development and O&M of nodes in the production environment

Describes mechanisms for node development and O&M based on the physical attributes of the workspace to which nodes belong.

Advantages and disadvantages of different workspace modes

Describes the advantages and disadvantages of different workspace modes.

Compute engines that correspond to different DataWorks service modules in workspaces in basic and standard modes

Describes the compute engines that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes. Workspaces in basic mode provide only the production environment. Workspaces in standard mode provide both the development and production environments.

How to isolate data between development and production environments for a workspace in basic mode

If you use a workspace in basic mode and want to isolate data between development and production environments, you can refer to this section.

Precautions

  • Workspaces in different modes have different requirements for compute engines. For example, for a workspace in standard mode, you must associate compute engines that are physically isolated with the workspace in the development and production environments. This way, data can be isolated between the environments. For information about how to associate compute engines with a DataWorks workspace, see Associate compute engines with a workspace and manage the compute engines.

  • The features of the compute engine that you associate with a DataWorks workspace determine whether resources can be accessed across projects or databases. If you associate different instances, projects, or databases with a workspace in the development and production environments, the features of these compute engines determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.

  • By default, for a workspace in standard mode, nodes in the development environment are not periodically scheduled, and nodes are periodically scheduled only after they are deployed to the production environment.

Workspaces in basic and standard modes

The following table compares the physical architectures of workspaces in basic and standard modes from various aspects.

Note

You can create a workspace in basic or standard mode based on your business requirements. We recommend that you create a workspace in standard mode to develop data because it can meet your different requirements. For example, if you use a workspace in standard mode, code, computing resources, permission management, and node deployment process control are isolated between the development and production environments.

If you use a workspace in basic mode and you want to retain the code in the current workspace, you can upgrade the mode of your workspace. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.

The following table describes the differences in the physical architecture between workspaces in basic and standard modes from various aspects.

Aspect

Basic mode

Standard mode (recommended)

Number of associated compute engines

One DataWorks workspace corresponds to one compute engine, such as a project, instance, or database. Basic mode

One DataWorks workspace corresponds to two compute engines, such as projects, instances, or databases. This way, the compute engines are isolated between the development and production environments.

Note

You must associate compute engines that are physically isolated with the workspace in the development and production environments to isolate data between the environments.

Standard mode

DataWorks environment

One compute engine, such as a project, instance, or database, serves as the DataWorks production environment.

One of the compute engines, such as projects, instances, or databases, serves as the DataWorks development environment, and the other compute engine serves as the DataWorks production environment.

Note

You can associate different compute engines with a workspace in the development and production environments. Example:

  • You can associate different instances with a workspace in the development and production environments.

  • You can associate different projects or databases in the same instance with a workspace in the development and production environments.

Impacts of different workspace modes on development and O&M of nodes in the production environment

Item

Basic mode

Standard mode (recommended)

Differences in development process control for nodes in the production environment

After you commit a node, the node enters the scheduling system. Then, the node is periodically scheduled to generate data.

(Commit to the production environment)

Basic mode

You must first commit a node to the development environment and then deploy the node to the production environment for automatic scheduling.

(Commit to the development environment and deploy to the production environment)

Note

For a workspace in standard mode, only nodes in the production environment can be automatically scheduled.

Standard mode

Differences in O&M permission management for nodes in the production environment

Developers can directly modify code of nodes in the production environment.

Developers can only modify and commit node code on the DataStudio page, but cannot directly deploy node code to the production environment. You can deploy node code only after you are assigned the Project Owner, Workspace Manager, or O&M role.

  • You can modify code only in the development environment. You cannot modify code in the production environment.

  • You can plan and control the node development and O&M processes in DataWorks based on the features of workspaces in standard mode and the DataWorks role permission system.

Differences in permission management for data in the production environment

Developers can directly perform tests by using production data. This cannot ensure the security of production data.

Developers can perform tests by using test data in the development environment. Developers can also verify features by using production table data in the development environment after the developers obtain the required permissions or after their request to perform the operations is approved in Security Center.

Note
  • Only a MaxCompute compute engine allows you to request permissions on tables in the production environment in Security Center in a visualized manner. For information about how to manage permissions on MaxCompute, see Manage permissions on data in a MaxCompute compute engine instance.

  • The features of the compute engine that you associate with a DataWorks workspace determine whether resources can be accessed across projects or databases. If you associate different instances, projects, or databases with a workspace in the development and production environments, the features of these compute engines determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.

Differences in data access identities

A unified identity is used to directly perform operations in the production environment.

Access identities for compute engines such as MaxCompute, Hologres, E-MapReduce (EMR), and Cloudera's Distribution including Apache Hadoop (CDH): Alibaba Cloud account, RAM user, RAM role (supported only by MaxCompute), and node owner.

Note

If you associate a compute engine other than the preceding types of compute engines, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL compute engine, with a workspace in this mode, only the database account that you specified when you configure the compute engine can perform operations in the specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database.

  • Development environment: By default, the node executor (current logon user) tests nodes.

  • Production environment: A specified identity is used to schedule nodes. You can specify the accounts or roles that can be used to access compute engines for workspaces in different modes in the Compute Engine Information section of the Workspace Management page. You can also change the account or role that is used to access the compute engine for the workspace in the production environment in the same section. For information about the identities that are used to access different types of compute engines for a workspace in the development and production environments, see Associate compute engines with a workspace and manage the compute engines. Image

Note

MaxCompute, Hologres, EMR, and CDH

  • Development environment: node owner

  • Production environment: Alibaba Cloud account, RAM user, and RAM role (supported only by MaxCompute)

If you associate a compute engine other than the preceding types of compute engines, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL compute engine, with a workspace in this mode, only the database account that you specified when you configure the compute engine can perform operations in the specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database.

Advantages and disadvantages of different workspace modes

Item

Basic mode

Standard mode

Advantages

Workspaces in this mode are simple and easy to use.

You need to only assign the Development role to development engineers to complete all data warehouse development operations.

Workspaces in this mode are secure and standardized.

  • A secure and standardized process is provided to allow you to deploy and manage node code, including features such as code review and code check by using the diff command. This ensures the stability of the production environment and prevents unexpected scenarios such as dirty data spreading and node errors caused by illogical code.

  • Data activities are managed in an efficient manner and data security is ensured.

Disadvantages

The risks of instability and low data security may arise in the production environment.

  • A workspace in basic mode provides only the production environment. As a result, data cannot be isolated and you can perform only basic data development operations.

  • You cannot manage permissions on production tables.

    Note

    If you associate a MaxCompute project with a workspace in basic mode, users that are assigned the Development role have read and write permissions on all tables of the MaxCompute project by default and can add, delete, or modify tables. This increases data security risks.

  • You cannot control the data development process.

    Note

    Users that are assigned the Development role can add, modify, or commit node code to the scheduling system without the need to obtain approval. This may cause instabilities and uncertainties to data production.

The data development and production process is more complex. In most cases, the process involves more than one developer.

Appendix: Compute engines that correspond to different DataWorks service modules in workspaces in basic and standard modes

You can view the information about compute engines that are associated with your workspace in the Compute Engine Information section of the Workspace Management page. The following table describes the compute engines that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes.

Service module

Standard mode

Basic mode

DataStudio

The compute engine such as an instance, project, or database in the development environment is used.

The compute engine such as an instance, project, or database in the production environment is used.

Operation Center

  • Operation Center in the development environment: The compute engine such as an instance, project, or database in the development environment is used.

  • Operation Center in the production environment: The compute engine such as an instance, project, or database in the production environment is used.

Appendix: How to isolate data between development and production environments for a workspace in basic mode

Requirement: You use a workspace in basic mode and want to isolate data between the development and production environments.

Solution: Prepare two workspaces in basic mode. Workspace 1 serves as the development environment and Workspace 2 serves as the production environment. Use the cross-workspace deployment method to deploy nodes in Workspace 1 to Workspace 2. This way, data can be isolated between the environments.

Disadvantages: You can directly modify production code in DataStudio in the workspace that serves as the production environment. This results in inconsistency of code update entries in the production environment and affects the entire development process.

Suggestion: We recommend that you upgrade your workspace from basic mode to standard mode for better control of the development process. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.