This topic describes the workspace system in DataWorks and workspace creation plans for different scenarios.

What is a workspace?

Workspaces are basic units for managing nodes, members, roles, and permissions in DataWorks. The administrator of a workspace can add members to the workspace and assign the workspace administrator, developer, administration expert, deployment expert, security expert, or visitor role to each member. This way, workspace members with different roles can collaborate with each other.

You can bind multiple types of compute engine instances to a workspace, such as MaxCompute, E-MapReduce (EMR), and Realtime Compute for Apache Flink. Then, you can configure and schedule nodes and manage the data that is stored by the compute engine instances in the workspace.

The members that are added to a workspace can assume the following roles: workspace administrator, deployment expert, developer, model developer, visitor, project owner, administration expert, and security expert. Each role has different permissions.

Workspaces in different modes

DataWorks provides workspaces in basic mode and standard mode.
  • Workspaces in basic mode
    A workspace in basic mode does not isolate the development environment from the production environment. You can manage the data of each compute engine instance that is bound to this workspace only in a single environment. In this workspace, you can perform only basic data development but cannot completely control the data development process and table permissions.
    • Advantages: This mode is easy to use and features fast iteration. The code of a node takes effect immediately after you commit the node, without the need to deploy the node.
    • Disadvantages: Developers may have excessive permissions. For example, the developers can delete tables in the current workspace. This puts data at risk.
    In a workspace in basic mode, a node can be run by the owner of the node or an Alibaba Cloud account. Then, the generated data is owned by the node owner or the Alibaba Cloud account.
  • Workspaces in standard mode

    A workspace in standard mode isolates the development environment from the production environment. You can manage the data of each compute engine instance that is bound to this workspace in two environments. In addition, you can manage permissions on different environments by assigning different roles. Developers can manage data and nodes only in the development environment. Administration experts can manage nodes only in the production environment. No members can directly access the data in the production environment. Different roles supervise each other to ensure that the workspace conforms to the specifications for data warehouse management.

    A workspace in standard mode allows you to strictly control table permissions. Unauthorized members are prohibited from managing tables in the production environment. This ensures data security.

    • Advantages: Strict control over permissions and process standardization ensure the security of data and code.
    • Disadvantages: You must create a development environment for compute engine instances. The management of permissions and roles is more complex.
    In a workspace in standard mode, the identities for running nodes vary in the development and production environments.
    • Development environment: Only node owners can run nodes.
    • Production environment: An Alibaba Cloud account or a RAM user can run and schedule nodes.

Isolation rules for workspaces

The main services of DataWorks have different isolation rules for workspaces, as described in the following table.
Service Isolation rule
Workspace management Workspace management is completely isolated among workspaces.
Different workspaces can have different administrators and different members. The settings of roles and compute engine instances take effect in each workspace.
Note Only Alibaba Cloud accounts can be workspace owners.
DataStudio Data development is completely isolated among workspaces.
  • Workflows and nodes in different workspaces are separately developed.
  • In a workspace:
    • Only developers and workspace administrators have the permissions to create, edit, and delete nodes.
    • Only developers, administration experts, and workspace administrators have the permissions to commit nodes.
    • Only administration experts, deployment experts, and workspace administrators have the permissions to deploy nodes to the production environment.
Note A node in a workspace can depend on nodes in other workspaces.
Operation Center The Operation Center service is partially isolated among workspaces.
  • O&M of real-time nodes, auto triggered nodes, and manually triggered nodes: The node O&M is isolated among workspaces. Only the developers, administration experts, and workspace administrators in each workspace have the permissions to manage the nodes in the workspace.
  • Dashboard: The statistics about node O&M are isolated among workspaces. The dashboard displays the statistics about the nodes in a workspace. This way, you can have a clear view of nodes.
  • Alarm: This feature is not isolated among workspaces and allows you to use a baseline to monitor nodes from different workspaces. Only Alibaba Cloud accounts and workspace administrators have the permissions to create baselines.
Data Map The Data Map service is shared by all workspaces of a tenant.
In Data Map, you can query and view the metadata of all workspaces of the tenant in the current region.
Note Only the metadata that is displayed in Data Map is shared by all workspaces of a tenant. The read and write permissions on data are not shared. The read and write permissions on the data in the development environment are shared by the developers of each workspace. In the production environment, only the tenant account has the permissions to manage data.
Data Quality The Data Quality service is completely isolated among workspaces.

Only the developers, administration experts, and workspace administrators in each workspace have the permissions to configure data quality rules in the workspace.

DataService Studio The DataService Studio service is partially isolated among workspaces.

Workspaces share the definition of API groups. However, the APIs that are registered or published in a workspace are visible only in the workspace.

Data Security Guard The Data Security Guard service is shared by all workspaces.

All workspaces share a set of data security rules and a set of data security levels for sensitive data. If you select Safe for the Access Mode parameter, only the security experts in each workspace have the permissions to perform operations in Data Security Guard.

General requirements

Regardless of the dimension from which you create workspaces, we recommend that you meet the following requirements:
  • You can follow uniform naming conventions. For example, the workspace name indicates the business meaning and the names of nodes contain the abbreviation of the workspace to which the nodes belong. This helps you distinguish workspaces and nodes.
  • To create a workspace, you must use an Alibaba Cloud account or a tenant administrator that is assigned the workspace administrator role. A member cannot assume the developer and administration expert roles at the same time.

Workspace planning

You can create workspaces from different dimensions: departments, business projects, and data warehouse layers. You can also create workspaces by taking all the dimensions into account.
Dimension By department By business project By data warehouse layer
Workspace creation basis You can create workspaces based on the organizational structure of your company.

For example, you can create a workspace for each department, such as the production department, marketing department, human resources department, and finance department. In each workspace, each department can develop its data and manage its tables.

You can create workspaces based on the planning of business projects.

For example, you can create a workspace for each business project, such as the Quarterly Sales Sprint project, the Production Security Inspection in Spring project, and the Administrator Cockpit Report project. Each business project involves multiple departments. In this case, each workspace ingests data from multiple business systems and generates data by aggregation and processing to support each business project.

You can create one or more workspaces for each data warehouse layer.

For example, you can create workspaces for the data access layer (DAL), operational data store (ODS) layer, and data warehouse summary (DWS) layer.

Scenarios You can create workspaces for departments that have simple business needs, own members with development capabilities, and rarely share data. A single department can complete the end-to-end business development. You can create workspaces for business projects that have a higher priority or require the collaboration of multiple departments. You can create workspaces for large-sized data warehouses, the common data model (CDM) layer of enterprises, and Data Mid-end.
Advantages Workspace members are added based on the organizational structure. This dimension maximizes the stability of human resources and data security. In addition, the costs of computing and storage are easy to calculate for different departments. In a workspace, the business needs are specific and the data link is easy to understand. This way, you can adjust the responsibilities of members based on business needs and manage data in an easy way. The data architecture is clear, data sharing is convenient, less development skills are required, and resources can be allocated based on the characteristics of each layer.
Disadvantages The data of different departments tends to be siloed. Data may be repeatedly processed and stored. The dependencies among data from different workspaces are complex and resources tend to be preempted by different workspaces. The data architecture is unclear and different workspaces implement different business logic. The members of a workspace may come from different departments. This increases risks in data security. The development cycle and the O&M link are long. Mutually dependent nodes may be developed in different workspaces that correspond to different data warehouse layers. In a workspace in standard mode, if the code of a node needs to be modified and deployed, the code of its dependent nodes in other workspaces also needs to be modified.
Architecture stability ★★★★★ ★☆☆☆☆ ★★★★★
Flexibility in human resource management ★☆☆☆☆ ★★★★★ ★★★★☆
Business complexity ★★☆☆☆ ★★★★☆ ★★★☆☆
Data security ★★★★★ ★★☆☆☆ ★★★☆☆
Maintainability ★★☆☆☆ ★★★★★ ★★☆☆☆
Data sharing ★★★☆☆ ★☆☆☆☆ ★★★★★
You can create workspaces by taking three dimensions into account to combine all the advantages. A strategy that combines three dimensions and creates workspaces based on data warehouse layers is commonly used. In this strategy, multiple workspaces are created for each layer.
  • Data staging area (STG): A workspace is created for each application system, such as stg_Marketing system and stg_Production management system.
    • Nodes: only data integration nodes.
    • Tables: tables that store only raw data and have a short time to live (TTL).
    • Workspace members: database administrators (DBAs) of each application system.
    • Resources that are first allocated: resource groups for Data Integration and storage space.
  • ODS layer: A workspace is created for each department, such as ods_Human resources department and ods_Production department. At this layer, the data from different departments is standardized and sensitive data is filtered out.
    • Nodes: SQL nodes with a single ancestor node and a single descendant node.
    • Tables: tables at the ODS layer.
    • Workspace members: the data cleansing staff from each department.
    • Resources that are first allocated: resource groups for scheduling and computing resources of compute engines. Resource groups for scheduling are first allocated to nodes in the first stage, such as nodes that are scheduled to be run from 00:00:00 to 02:00:00.
  • Data warehouse (DW) layer: A single workspace is created for this layer or a workspace is created for each business domain, such as dw_Customer and dw_Commodity.
    • Nodes: SQL nodes with multiple ancestor nodes and a single descendant node.
    • Tables: fact tables and dimension tables at the DW layer.
    • Workspace members: professional developers at the common data model (CDM) layer.
    • Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and storage space. Resource groups for scheduling are first allocated to nodes in the second stage, such as nodes that are scheduled to be run from 02:00:00 to 05:00:00. Storage space is allocated to deal with data bloat.
  • Tag data model (TDM) layer: A single workspace is created for this layer or a workspace is created for each business object.
    • Nodes: SQL nodes with multiple ancestor nodes and a single descendant node.
    • Tables: tag tables.
    • Workspace members: professional developers at the CDM layer.
    • Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and storage space. Resource groups for scheduling are first allocated to nodes in the third stage, such as nodes that are scheduled to be run from 05:00:00 to 07:00:00. Storage space is allocated to deal with data bloat.
  • Application data store (ADS) layer: A workspace is created for each business project.
    • Nodes: SQL nodes and data integration nodes.
    • Tables: tables that meet business needs.
    • Workspace members: members of each business project.
    • Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and resource groups for Data Integration. Resource groups for scheduling are first allocated to nodes in the final stage, such as nodes that are scheduled to be run from 07:00:00 to 09:00:00.