DataWorks provides workspaces in basic and standard modes. This allows you to develop data based on different security control requirements. This topic describes the differences between workspaces in basic mode and workspaces in standard mode from various aspects, including the differences in physical architectures and impacts on node development.
Background information
This topic consists of the sections that are described in the following table.
Section | Description |
Describes the physical architectures of workspaces in basic and standard modes. | |
Impacts of different workspace modes on development and O&M of nodes in the production environment | Describes mechanisms for node development and O&M based on the physical attributes of the workspace to which nodes belong. |
Describes the advantages and disadvantages of different workspace modes. | |
Describes the compute engines that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes. Workspaces in basic mode provide only the production environment. Workspaces in standard mode provide both the development and production environments. | |
How to isolate data between development and production environments for a workspace in basic mode | If you use a workspace in basic mode and want to isolate data between development and production environments, you can refer to this section. |
Precautions
Workspaces in different modes have different requirements for compute engines. For example, for a workspace in standard mode, you must associate compute engines that are physically isolated with the workspace in the development and production environments. This way, data can be isolated between the environments. For information about how to associate compute engines with a DataWorks workspace, see Associate compute engines with a workspace and manage the compute engines.
The features of the compute engine that you associate with a DataWorks workspace determine whether resources can be accessed across projects or databases. If you associate different instances, projects, or databases with a workspace in the development and production environments, the features of these compute engines determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.
By default, for a workspace in standard mode, nodes in the development environment are not periodically scheduled, and nodes are periodically scheduled only after they are deployed to the production environment.
Workspaces in basic and standard modes
The following table compares the physical architectures of workspaces in basic and standard modes from various aspects.
You can create a workspace in basic or standard mode based on your business requirements. We recommend that you create a workspace in standard mode to develop data because it can meet your different requirements. For example, if you use a workspace in standard mode, code, computing resources, permission management, and node deployment process control are isolated between the development and production environments.
If you use a workspace in basic mode and you want to retain the code in the current workspace, you can upgrade the mode of your workspace. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.
The following table describes the differences in the physical architecture between workspaces in basic and standard modes from various aspects.
Aspect | Basic mode | Standard mode (recommended) |
Number of associated compute engines | One DataWorks workspace corresponds to one compute engine, such as a project, instance, or database. | One DataWorks workspace corresponds to two compute engines, such as projects, instances, or databases. This way, the compute engines are isolated between the development and production environments. Note You must associate compute engines that are physically isolated with the workspace in the development and production environments to isolate data between the environments. ![]() |
DataWorks environment | One compute engine, such as a project, instance, or database, serves as the DataWorks production environment. | One of the compute engines, such as projects, instances, or databases, serves as the DataWorks development environment, and the other compute engine serves as the DataWorks production environment. Note You can associate different compute engines with a workspace in the development and production environments. Example:
|
Impacts of different workspace modes on development and O&M of nodes in the production environment
Item | Basic mode | Standard mode (recommended) |
Differences in development process control for nodes in the production environment | After you commit a node, the node enters the scheduling system. Then, the node is periodically scheduled to generate data. (Commit to the production environment) ![]() | You must first commit a node to the development environment and then deploy the node to the production environment for automatic scheduling. (Commit to the development environment and deploy to the production environment) Note For a workspace in standard mode, only nodes in the production environment can be automatically scheduled. ![]() |
Differences in O&M permission management for nodes in the production environment | Developers can directly modify code of nodes in the production environment. | Developers can only modify and commit node code on the DataStudio page, but cannot directly deploy node code to the production environment. You can deploy node code only after you are assigned the Project Owner, Workspace Manager, or O&M role.
|
Differences in permission management for data in the production environment | Developers can directly perform tests by using production data. This cannot ensure the security of production data. | Developers can perform tests by using test data in the development environment. Developers can also verify features by using production table data in the development environment after the developers obtain the required permissions or after their request to perform the operations is approved in Security Center. Note
|
Differences in data access identities | A unified identity is used to directly perform operations in the production environment. Access identities for compute engines such as MaxCompute, Hologres, E-MapReduce (EMR), and Cloudera's Distribution including Apache Hadoop (CDH): Alibaba Cloud account, RAM user, RAM role (supported only by MaxCompute), and node owner. Note If you associate a compute engine other than the preceding types of compute engines, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL compute engine, with a workspace in this mode, only the database account that you specified when you configure the compute engine can perform operations in the specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database. |
Note MaxCompute, Hologres, EMR, and CDH
If you associate a compute engine other than the preceding types of compute engines, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL compute engine, with a workspace in this mode, only the database account that you specified when you configure the compute engine can perform operations in the specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database. |
Advantages and disadvantages of different workspace modes
Item | Basic mode | Standard mode |
Advantages | Workspaces in this mode are simple and easy to use. You need to only assign the Development role to development engineers to complete all data warehouse development operations. | Workspaces in this mode are secure and standardized.
|
Disadvantages | The risks of instability and low data security may arise in the production environment.
| The data development and production process is more complex. In most cases, the process involves more than one developer. |
Appendix: Compute engines that correspond to different DataWorks service modules in workspaces in basic and standard modes
You can view the information about compute engines that are associated with your workspace in the Compute Engine Information section of the Workspace Management page. The following table describes the compute engines that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes.
Service module | Standard mode | Basic mode |
DataStudio | The compute engine such as an instance, project, or database in the development environment is used. | The compute engine such as an instance, project, or database in the production environment is used. |
Operation Center |
|
Appendix: How to isolate data between development and production environments for a workspace in basic mode
Requirement: You use a workspace in basic mode and want to isolate data between the development and production environments.
Solution: Prepare two workspaces in basic mode. Workspace 1 serves as the development environment and Workspace 2 serves as the production environment. Use the cross-workspace deployment method to deploy nodes in Workspace 1 to Workspace 2. This way, data can be isolated between the environments.
Disadvantages: You can directly modify production code in DataStudio in the workspace that serves as the production environment. This results in inconsistency of code update entries in the production environment and affects the entire development process.
Suggestion: We recommend that you upgrade your workspace from basic mode to standard mode for better control of the development process. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.