All Products
Search
Document Center

DataWorks:Differences between workspaces in basic mode and workspaces in standard mode

Last Updated:Apr 10, 2024

DataWorks provides workspaces in basic and standard modes. This allows you to develop data based on different security control requirements. This topic describes the differences between workspaces in basic mode and workspaces in standard mode from various aspects, including the differences in physical architectures and impacts on node development.

Background information

This topic consists of the sections that are described in the following table.

Section

Description

Workspaces in basic and standard modes

Describes the physical architectures of workspaces in basic and standard modes.

Impacts of different workspace modes on development and O&M of nodes in the production environment

Describes mechanisms for node development and O&M based on the physical attributes of the workspace to which nodes belong.

Advantages and disadvantages of different workspace modes

Describes the advantages and disadvantages of different workspace modes.

Diagram of impacts of workspaces in standard mode on usage processes

Describes the process control that is implemented based on collaboration among users that are assigned different roles in a workspace in standard mode.

Data sources that are used when you perform operations on different DataWorks modules in workspaces in basic and standard modes

Describes the data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes. Workspaces in basic mode provide only the production environment. Workspaces in standard mode provide both the development and production environments.

How to isolate data between development and production environments for a workspace in basic mode

If you use a workspace in basic mode and want to isolate data between development and production environments, you can refer to this section.

Precautions

  • Workspaces in different modes have different requirements for the addition of a data source. For example, for a workspace in standard mode, you must separately add data sources to the workspace in the development and production environments. This way, data can be physically isolated between the environments. For information about how to add data sources to a DataWorks workspace, see Add and manage data sources.

  • The characteristics of the data source that you add to a DataWorks workspace determine whether resources can be accessed across projects or databases. If you add different data sources to a workspace in the development and production environments, the characteristics of the data sources determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.

  • By default, for a workspace in standard mode, nodes in the development environment are not periodically scheduled, and nodes are periodically scheduled only after they are deployed to the production environment.

Workspaces in basic and standard modes

The following table compares the physical architectures of workspaces in basic and standard modes from various aspects.

Note

You can create a workspace in basic or standard mode based on your business requirements. We recommend that you create a workspace in standard mode to develop data because it can meet your different requirements. For example, if you use a workspace in standard mode, code, computing resources, permission management, and node deployment process control are isolated between the development and production environments.

If you use a workspace in basic mode and you want to retain the code in the current workspace, you can upgrade the mode of your workspace. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.

The following table describes the differences in the physical architecture between workspaces in basic and standard modes from various aspects.

Aspect

Basic mode

Standard mode (recommended)

Number of added data sources

One DataWorks workspace corresponds to one data source.简单模式

One DataWorks workspace corresponds to two data sources. This way, the data sources are isolated between the development and production environments.

Note

You must separately add data sources to the workspace in the development and production environments to physically isolate data between the environments.

标准模式

DataWorks environment

One data source serves as the DataWorks production environment.

One of the data sources serves as the DataWorks development environment, and the other data source serves as the DataWorks production environment.

Note

You can add different data sources to a workspace in the development and production environments. Example:

  • You can add different instances to a workspace in the development and production environments.

  • You can add different projects or databases in the same instance to a workspace in the development and production environments.

Impacts of different workspace modes on development and O&M of nodes in the production environment

Item

Basic mode

Standard mode (recommended)

Differences in development process control for nodes in the production environment

After you commit a node, the node enters the scheduling system. Then, the node is periodically scheduled to generate data.

(Commit to the production environment)

简单模式

You must first commit a node to the development environment and then deploy the node to the production environment for automatic scheduling.

(Commit to the development environment and deploy to the production environment)

Note

For a workspace in standard mode, only nodes in the production environment can be automatically scheduled.

标准模式

Differences in O&M permission management for nodes in the production environment

Developers can directly modify code of nodes in the production environment.

Developers can only modify and commit node code on the DataStudio page, but cannot directly deploy node code to the production environment. You can deploy node code only after you are assigned the Project Owner, Workspace Manager, or O&M role.

  • You can modify code only in the development environment. You cannot modify code in the production environment.

  • You can plan and control the node development and O&M processes in DataWorks based on the features of workspaces in standard mode and the DataWorks role permission system. For more information, see Sample scenario: Impacts of workspaces in standard mode on usage processes.

Differences in permission management for data in the production environment

Developers can directly perform tests by using production data. This cannot ensure the security of production data.

Developers can perform tests by using test data in the development environment. Developers can also verify features by using production table data in the development environment after the developers obtain the required permissions or after their request to perform the operations is approved in Security Center.

Note
  • Only a MaxCompute data source allows you to request permissions on tables in the production environment in Security Center in a visualized manner. For information about how to manage permissions on MaxCompute, see Manage permissions on data in a MaxCompute compute engine instance.

  • The characteristics of the data source that you add to a DataWorks workspace determine whether resources can be accessed across projects or databases. If you add different data sources to a workspace in the development and production environments, the characteristics of the data sources determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.

Differences in data access identities

A unified identity is used to directly perform operations in the production environment.

Access identities for data sources such as MaxCompute, Hologres, E-MapReduce (EMR), and Cloudera's Distribution including Apache Hadoop (CDH): Alibaba Cloud account, RAM user, RAM role (supported only by MaxCompute), and node owner.

Note

If you add a data source other than the preceding types of data sources, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL data source, to a workspace in this mode, only the database account that you specified when you configure the data source can perform operations in a specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database.

  • Development environment: By default, the node executor (current logon user) tests nodes.

  • Production environment: A specified identity is used to schedule nodes. You can find the desired data source and modify the access identity on the Data Sources page in Data Integration.

Note

MaxCompute, Hologres, EMR, and CDH

  • Development environment: node owner

  • Production environment: Alibaba Cloud account, RAM user, and RAM role (supported only by MaxCompute)

If you add a data source other than the preceding types of data sources, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL data source, to a workspace in this mode, only the database account that you specified when you configure the data source can perform operations in a specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database.

Advantages and disadvantages of different workspace modes

Item

Basic mode

Standard mode

Advantages

Workspaces in this mode are simple and easy to use.

You need to only assign the Development role to development engineers to complete all data warehouse development operations.

Workspaces in this mode are secure and standardized.

  • A secure and standardized process is provided to allow you to deploy and manage node code, including features such as code review and code check by using the diff command. This ensures the stability of the production environment and prevents unexpected scenarios such as dirty data spreading and node errors caused by illogical code.

  • Data activities are managed in an efficient manner and data security is ensured.

Disadvantages

The risks of instability and low data security may arise in the production environment.

  • A workspace in basic mode provides only the production environment. As a result, data cannot be isolated and you can perform only basic data development operations.

  • You cannot manage permissions on production tables.

    Note

    If you associate a MaxCompute project with a workspace in basic mode, users that are assigned the Development role have read and write permissions on all tables of the MaxCompute project by default and can add, delete, or modify tables. This increases data security risks.

  • You cannot control the data development process.

    Note

    Users that are assigned the Development role can add, modify, or commit node code to the scheduling system without the need to obtain approval. This may cause instabilities and uncertainties to data production.

The data development and production process is more complex. In most cases, the process involves more than one developer.

Sample scenario: Impacts of workspaces in standard mode on usage processes

The development and production isolation feature of a workspace in standard mode affects processes such as data modeling design, data processing, and code deployment.

Appendix: Data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes

You can view the information about data sources that are associated with the DataStudio service of a workspace on the Data Source page in DataStudio. The following table describes the data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes.

Service module

Standard mode

Basic mode

DataStudio

The data source such as an instance, project, or database in the development environment is used.

The data source such as an instance, project, or database in the production environment is used.

Operation Center

  • Operation Center in the development environment: The data source such as an instance, project, or database in the development environment is used.

  • Operation Center in the production environment: The data source such as an instance, project, or database in the production environment is used.

Appendix: How to isolate data between development and production environments for a workspace in basic mode

Requirement: You use a workspace in basic mode and want to isolate data between the development and production environments.

Solution: Prepare two workspaces in basic mode. Workspace 1 serves as the development environment and Workspace 2 serves as the production environment. Use the cross-workspace deployment method to deploy nodes in Workspace 1 to Workspace 2. This way, data can be isolated between the environments.

Disadvantages: You can directly modify production code in DataStudio in the workspace that serves as the production environment. This results in inconsistency of code update entries in the production environment and affects the entire development process.

Suggestion: We recommend that you upgrade your workspace from basic mode to standard mode for better control of the development process. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.