This topic describes the workspace system in DataWorks and workspace creation plans for different scenarios.
What is a workspace?
Workspaces are basic units for managing nodes, members, roles, and permissions in DataWorks. The administrator of a workspace can add members to the workspace and assign the workspace administrator, developer, administration expert, deployment expert, security expert, or visitor role to each member. This way, workspace members with different roles can collaborate with each other.
You can bind multiple types of compute engine instances to a workspace, such as MaxCompute, E-MapReduce (EMR), and Realtime Compute for Apache Flink. Then, you can configure and schedule nodes and manage the data that is stored by the compute engine instances in the workspace.
The members that are added to a workspace can assume the following roles: workspace administrator, deployment expert, developer, model developer, visitor, project owner, administration expert, and security expert. Each role has different permissions.
Workspaces in different modes
- Workspaces in basic mode
A workspace in basic mode does not isolate the development environment from the production environment. You can manage the data of each compute engine instance that is bound to this workspace only in a single environment. In this workspace, you can perform only basic data development but cannot completely control the data development process and table permissions.
- Advantages: This mode is easy to use and features fast iteration. The code of a node takes effect immediately after you commit the node, without the need to deploy the node.
- Disadvantages: Developers may have excessive permissions. For example, the developers can delete tables in the current workspace. This puts data at risk.
- Workspaces in standard mode
A workspace in standard mode isolates the development environment from the production environment. You can manage the data of each compute engine instance that is bound to this workspace in two environments. In addition, you can manage permissions on different environments by assigning different roles. Developers can manage data and nodes only in the development environment. Administration experts can manage nodes only in the production environment. No members can directly access the data in the production environment. Different roles supervise each other to ensure that the workspace conforms to the specifications for data warehouse management.
A workspace in standard mode allows you to strictly control table permissions. Unauthorized members are prohibited from managing tables in the production environment. This ensures data security.
- Advantages: Strict control over permissions and process standardization ensure the security of data and code.
- Disadvantages: You must create a development environment for compute engine instances. The management of permissions and roles is more complex.
- Development environment: Only node owners can run nodes.
- Production environment: An Alibaba Cloud account or a RAM user can run and schedule nodes.
Isolation rules for workspaces
|Workspace management||Workspace management is completely isolated among workspaces.
Different workspaces can have different administrators and different members. The settings of roles and compute engine instances take effect in each workspace.
Note Only Alibaba Cloud accounts can be workspace owners.
|DataStudio||Data development is completely isolated among workspaces.
Note A node in a workspace can depend on nodes in other workspaces.
|Operation Center||The Operation Center service is partially isolated among workspaces.
|Data Map||The Data Map service is shared by all workspaces of a tenant.
In Data Map, you can query and view the metadata of all workspaces of the tenant in the current region.
Note Only the metadata that is displayed in Data Map is shared by all workspaces of a tenant. The read and write permissions on data are not shared. The read and write permissions on the data in the development environment are shared by the developers of each workspace. In the production environment, only the tenant account has the permissions to manage data.
|Data Quality||The Data Quality service is completely isolated among workspaces.
Only the developers, administration experts, and workspace administrators in each workspace have the permissions to configure data quality rules in the workspace.
|DataService Studio||The DataService Studio service is partially isolated among workspaces.
Workspaces share the definition of API groups. However, the APIs that are registered or published in a workspace are visible only in the workspace.
|Data Security Guard||The Data Security Guard service is shared by all workspaces.
All workspaces share a set of data security rules and a set of data security levels for sensitive data. If you select Safe for the Access Mode parameter, only the security experts in each workspace have the permissions to perform operations in Data Security Guard.
- You can follow uniform naming conventions. For example, the workspace name indicates the business meaning and the names of nodes contain the abbreviation of the workspace to which the nodes belong. This helps you distinguish workspaces and nodes.
- To create a workspace, you must use an Alibaba Cloud account or a tenant administrator that is assigned the workspace administrator role. A member cannot assume the developer and administration expert roles at the same time.
|Dimension||By department||By business project||By data warehouse layer|
|Workspace creation basis||You can create workspaces based on the organizational structure of your company.
For example, you can create a workspace for each department, such as the production department, marketing department, human resources department, and finance department. In each workspace, each department can develop its data and manage its tables.
|You can create workspaces based on the planning of business projects.
For example, you can create a workspace for each business project, such as the Quarterly Sales Sprint project, the Production Security Inspection in Spring project, and the Administrator Cockpit Report project. Each business project involves multiple departments. In this case, each workspace ingests data from multiple business systems and generates data by aggregation and processing to support each business project.
|You can create one or more workspaces for each data warehouse layer.
For example, you can create workspaces for the data access layer (DAL), operational data store (ODS) layer, and data warehouse summary (DWS) layer.
|Scenarios||You can create workspaces for departments that have simple business needs, own members with development capabilities, and rarely share data. A single department can complete the end-to-end business development.||You can create workspaces for business projects that have a higher priority or require the collaboration of multiple departments.||You can create workspaces for large-sized data warehouses, the common data model (CDM) layer of enterprises, and Data Mid-end.|
|Advantages||Workspace members are added based on the organizational structure. This dimension maximizes the stability of human resources and data security. In addition, the costs of computing and storage are easy to calculate for different departments.||In a workspace, the business needs are specific and the data link is easy to understand. This way, you can adjust the responsibilities of members based on business needs and manage data in an easy way.||The data architecture is clear, data sharing is convenient, less development skills are required, and resources can be allocated based on the characteristics of each layer.|
|Disadvantages||The data of different departments tends to be siloed. Data may be repeatedly processed and stored. The dependencies among data from different workspaces are complex and resources tend to be preempted by different workspaces.||The data architecture is unclear and different workspaces implement different business logic. The members of a workspace may come from different departments. This increases risks in data security.||The development cycle and the O&M link are long. Mutually dependent nodes may be developed in different workspaces that correspond to different data warehouse layers. In a workspace in standard mode, if the code of a node needs to be modified and deployed, the code of its dependent nodes in other workspaces also needs to be modified.|
|Flexibility in human resource management||★☆☆☆☆||★★★★★||★★★★☆|
- Data staging area (STG): A workspace is created for each application system, such
as stg_Marketing system and stg_Production management system.
- Nodes: only data integration nodes.
- Tables: tables that store only raw data and have a short time to live (TTL).
- Workspace members: database administrators (DBAs) of each application system.
- Resources that are first allocated: resource groups for Data Integration and storage space.
- ODS layer: A workspace is created for each department, such as ods_Human resources
department and ods_Production department. At this layer, the data from different departments
is standardized and sensitive data is filtered out.
- Nodes: SQL nodes with a single ancestor node and a single descendant node.
- Tables: tables at the ODS layer.
- Workspace members: the data cleansing staff from each department.
- Resources that are first allocated: resource groups for scheduling and computing resources of compute engines. Resource groups for scheduling are first allocated to nodes in the first stage, such as nodes that are scheduled to be run from 00:00:00 to 02:00:00.
- Data warehouse (DW) layer: A single workspace is created for this layer or a workspace
is created for each business domain, such as dw_Customer and dw_Commodity.
- Nodes: SQL nodes with multiple ancestor nodes and a single descendant node.
- Tables: fact tables and dimension tables at the DW layer.
- Workspace members: professional developers at the common data model (CDM) layer.
- Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and storage space. Resource groups for scheduling are first allocated to nodes in the second stage, such as nodes that are scheduled to be run from 02:00:00 to 05:00:00. Storage space is allocated to deal with data bloat.
- Tag data model (TDM) layer: A single workspace is created for this layer or a workspace
is created for each business object.
- Nodes: SQL nodes with multiple ancestor nodes and a single descendant node.
- Tables: tag tables.
- Workspace members: professional developers at the CDM layer.
- Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and storage space. Resource groups for scheduling are first allocated to nodes in the third stage, such as nodes that are scheduled to be run from 05:00:00 to 07:00:00. Storage space is allocated to deal with data bloat.
- Application data store (ADS) layer: A workspace is created for each business project.
- Nodes: SQL nodes and data integration nodes.
- Tables: tables that meet business needs.
- Workspace members: members of each business project.
- Resources that are first allocated: resource groups for scheduling, computing resources of compute engines, and resource groups for Data Integration. Resource groups for scheduling are first allocated to nodes in the final stage, such as nodes that are scheduled to be run from 07:00:00 to 09:00:00.