This topic describes the features that are provided by DataWorks on the DataStudio page. This helps you understand the overall layout of and modules on the DataStudio page and easily access relevant topics.

Go to the DataStudio page

  1. Log on to the DataWorks console, click Workspaces in the left-side navigation pane, and then select a region.
  2. Find the workspace that you want to manage and click DataStudio in the Actions column to go to the DataStudio page of the workspace.
On the DataStudio page, you can create workflows and different types of nodes for data development based on your business requirements. For more information, see Manage workflows and Create a batch sync node.
The features for different data development operations vary. The following sections in this topic describe these features to facilitate your understanding:

Overall layout of the DataStudio page

The following figure shows the overall layout of the DataStudio page. Overall layout
Section Description
1
2 In this section, you can click the More icon icon to show or hide the names of the module tabs in the left-side navigation pane.
  • Scheduled Workflow: On this tab, you can create auto triggered nodes that use different compute engines for data development. The nodes created on this tab can be deployed to the production environment for O&M.
    Note Before you can use a specific compute engine for data development, you must associate your workspace with the compute engine.
  • Manually Triggered Workflows: On this tab, you can develop manually triggered nodes. The nodes created on this tab can be deployed to the production environment for O&M.
  • Operating history: On this tab, you can view the records of the nodes that are run within the previous three days in DataStudio.
  • Ad Hoc Query: On this tab, you can perform a simple ad hoc query to test your code. However, the code of an ad hoc query cannot be deployed to the production environment for O&M.
  • Tenant Tables: On this tab, you can view all production tables of the current Alibaba Cloud account.
  • Workspace Tables: On this tab, you can perform operations on a table in a visualized manner. The operations that you can perform on a table must be supported by the compute engine used to create the table.
  • Built-in Functions: On this tab, you can view the descriptions of all built-in MaxCompute functions.
  • Recycle Bin: On this tab, you can manage the nodes, resources, and functions that are removed from the Scheduled Workflow or Manually Triggered Workflows tab.
  • Snippets: A script template is a pre-defined block of code that involves multiple input and output parameters. Each SQL code block references one or more source tables. You can filter source table data, join source tables, and aggregate them to generate a table required by the new business.
  • Operation History: On this tab, you can filter and view historical operation records in the current workspace by operation type, operator, and operation time.
  • Operation Check: On this tab, you can filter and view operations by operation type and check status.
  • MaxCompute:
    • MaxCompute Resources: On this tab, you can manage the existing MaxCompute resources and view the operation records of a specific MaxCompute resource. In addition, you can add a MaxCompute resource that is not uploaded in DataWorks to the Scheduled Workflow tab for management.
    • MaxCompute Functions: On this tab, you can manage the existing MaxCompute functions and view the operation records of a specific MaxCompute function. In addition, you can add a MaxCompute function that is not registered with DataWorks to the Scheduled Workflow tab for management.
Note If a specific module is not displayed in the left-side navigation pane, you can click the Settings icon icon in Section 4 to add the module on the Settings page. For more information, see Personal settings.
3
DataStudio shortcuts to other services:
  • Node Config: You can click Node Config to manage custom nodes and wrappers to meet your requirements for data development and data quality. After you create a custom node, you can write an SQL statement for the node in DataStudio. DataWorks parses and executes the SQL statement based on the processing logic of the wrapper that you define for the node in the background. Before you create a custom node, you must create a wrapper to define the processing logic of the node.
  • Deploy: You can click Deploy to deploy a node that is developed in DataStudio to the production environment. You can manage the deployment process of the node.
  • Cross-project cloning: You can click Cross-project cloning to clone and migrate nodes such as compute nodes and synchronization nodes between workspaces.
  • Operation Center: You can click Operation Center to perform O&M operations on nodes in Operation Center. In Operation Center, you can switch between the development environment and the production environment. You can perform O&M operations on deployed nodes in Operation Center in the production environment.
Common features of DataWorks services:
Note DataWorks services share common features. The following content describes the common features that are provided by DataWorks on the DataStudio page.
  • Notification Center (Notification Center icon): You can click this icon to obtain the latest updates of DataWorks at the earliest opportunity.
  • Helps (Helps icon): You can click this icon to obtain information about how to use a specific feature based on your business requirements.
  • Workspace Manage (Workspace Manage icon): You can click this icon to go to the Workspace Management page. On this page, you can view the basic information, scheduling properties, security settings, and associated compute engines of the workspace. For more information, see Configure a workspace.
  • Language switch: You can click the current language and switch to another language. For example, you can switch from Chinese to English.
  • Account information: You can click the account to view the personal information of the account and the status statistics about nodes in the workbench.
4 After you click the Settings icon in Section 4, you can set system configurations on the following tabs of the Settings page:
  • Personal Settings: On this tab, you can manage DataStudio modules, editor settings, and general settings, such as the DataWorks theme.
  • Code Templates: On this tab, you can modify a code template to a required style.
  • Scheduling Settings: On this tab, you can enable the periodic scheduling feature and configure the default scheduling settings for auto triggered nodes. Auto triggered nodes can be run as scheduled only after the periodic scheduling feature is enabled.
  • Table Management: On this tab, you can manage settings such as partition formats, identifiers of partition fields, prefixes of table names, table folders, and table levels.
  • Workspace Backup and Restoration: On this tab, you can compress and download code to back up the code of a workspace. You can also upload the code package and use the package to restore the code that is accidentally deleted.
  • Security Settings and Others:
    • Security settings: You can specify whether to mask sensitive information in the returned results of queries that you perform in DataStudio in the current workspace.
    • Other settings: You can enable forcible code review and specify one or more code reviewers to manage code quality of your nodes.
5 This section displays the keyboard shortcuts that are commonly used in the DataStudio editor. For more information about the keyboard shortcuts, see Editor shortcuts.

Features related to workflows

By default, the Scheduled Workflow tab appears if you go to the DataStudio page. On the Scheduled Workflow tab, you must create a workflow before you can organize your data development operations. For more information about how to create a workflow, see the "Create a workflow" section in the Manage workflows topic. The following figure shows the features related to workflows. Workflow-related features
Section Description
1
  • Solution: You can create a solution to manage multiple workflows. A workflow can be added to different solutions. Solutions can be displayed by using lists and cards on a GUI.
  • Business Flow: A workflow is an abstract business entity. You can create a workflow to organize code development operations based on the business requirements.
Click the All Solutions or All Workspaces icon icon to show all solutions or workflows in the current workspace.
2
  • Refresh (Refresh icon): After you modify a workflow or solution, you can click this icon to refresh the corresponding directory tree.
  • Locate (Locate icon): You can click this icon to find the current node on the Scheduled Workflow tab.
  • Search Code (Search Code icon): You can click this icon to search for a code snippet by using keywords. This way, you can find all nodes that contain the code snippet on the Scheduled Workflow, Manually Triggered Workflows, Ad Hoc Query, and Recycle Bin tabs and view the details of the code snippet in a centralized manner. You can also use this feature to identify the node that causes changes to a table.
  • Batch Operation (Batch Operation icon): You can click this icon to modify the configurations of multiple tables, resources, and functions at a time. The configurations include the owner, engine instance, resource group for scheduling, rerun properties, scheduling type, recurrence, and scheduling timeout period.
  • Import (Import icon): You can click this icon to upload the data of a local file to a table in DataWorks. Take note that you can import the data of a local file only to a MaxCompute table.
  • Create (Create): You can click Create to quickly create workflows, nodes, tables, resources, and functions.
  • Solution and workflow directory trees:
    • All: This directory tree displays all created objects, including nodes, resources, and functions, in the current workspace by solution and workflow.
    • Owned by Me: This directory tree displays the objects, including nodes, resources, and functions, that are owned by the current account by solution and workflow.
    • My Favorites: This directory tree displays the objects, including nodes, resources, and functions, that are added to favorites by the current account by solution and workflow.
  • Node search:
    • Exact search: You can enter the name of a node or the identifier of a node creator in the search box and click the Search icon icon to search for the specified node.
    • Search by node type: You can click the Filter icon icon to specify the types of nodes that you want to search. After you specify a node type, the directory tree displays only nodes of the specified type in the current workspace.
      Note You can determine whether to hide engine instances or node folders based on your business requirements. After you select Hide Engine Instances or Hide Node Folders, engine instances or node folders are not displayed in the directory tree.
      • Hide Engine Instances and Hide Node Folders are applicable only to the latest version of workflows.
      • Generally, if an engine contains only one engine instance, we recommend that you hide the engine instance.
      • If you do not need to use node folders, such as Data Analytics, Table, Resource, and Function, you can hide them.
Note Before you perform data development operations in a new workspace, you must create a workflow and a node in the workflow. For more information about how to create a workflow, see the "Create a workflow" section in the Manage workflows topic.
3 In this section, you can use a directory tree to manage the nodes, tables, resources, and functions in each workflow.
  • Workflow: the unit for business development.
  • Node: the smallest unit for code development. You can develop code by node type, such as engine nodes, algorithm nodes, Data Integration nodes, database nodes, general nodes, or custom nodes.
  • Table: You can manage tables in DataStudio in a visualized manner.
  • Resource: You can upload resources in DataStudio in a visualized manner.
    Note Only MaxCompute, E-MapReduce (EMR), and Cloudera Distribution Hadoop (CDH) engines support visualized uploading of resources.
  • Function: You can register functions in a visualized manner.
    Note Only MaxCompute, EMR, and CDH engines support visualized registration of functions.
The icon before the name of a node indicates the status of the node:
  • Not Committed icon icon: indicates that the node is not committed. You can click this icon to commit the node.
  • Not Deployed icon icon: indicates that the node is not deployed. You can click this icon to deploy the node.
The last time when the node is edited is displayed after the node name.
Double-click the name of a workflow to go to the configuration tab of the workflow, as shown in Sections 5 to 8. On this tab, you can perform data development operations.
4 Resource Group Orchestration (Resource Group Orchestration icon): You can click this icon to change the resource groups for scheduling used by multiple nodes in a workflow during data development. If multiple resource groups for scheduling are used in your workspace, you can use this feature to change the resource groups for scheduling for the nodes in the workspace based on your business requirements. This helps you improve resource usage. After you change the resource groups for scheduling used by multiple nodes, you must deploy the nodes to the production environment so that the change can take effect in the production environment.
5
  • Common Nodes: This section displays the common types of nodes in the current workspace. This helps you quickly select a node type and create a node.
  • Node Group: You can use this feature to reference a set of nodes across workflows. You can add nodes that are frequently used in a workflow to a node group and reuse the node group in other workflows.
  • Quick node creation: You can drag nodes in sections, such as Data Integration, MaxCompute, and EMR, to the right-side canvas of a workflow to create the nodes in the workflow.
6 Tools on the canvas:
  • Switch Layout (Switch Layout icon): You can click this icon to switch the layout of the canvas to Vertical, Horizontal, or Grid.
  • Box (Box icon): You can click this icon to select nodes to form a node group and perform operations on the node group to manage selected nodes.
  • Refresh (Refresh icon): After you modify a workflow, you can click this icon to refresh the workflow.
  • Format (Format icon): You can click this icon to horizontally align the nodes on the canvas.
  • Adapt (Adapt icon): You can click this icon to adapt the current workflow layout to the size of the canvas.
  • Center (Center icon): You can click this icon to center nodes on the canvas.
  • 1:1 (1:1 icon): You can click this icon to change the scale of the directed acyclic graph (DAG) of nodes to 100%.
  • Zoom In (Zoom In icon): You can click this icon to zoom in the nodes in the current workflow.
  • Zoom Out (Zoom Out icon): You can click this icon to zoom out the nodes in the current workflow.
  • Search (Search icon): You can click this icon and enter a keyword in the search box to search for a node whose name contains the keyword.
    Note Fuzzy match is supported. After you enter a keyword, DataWorks displays all nodes whose names contain the keyword in the current workflow.
  • Toggle Full Screen View (Toggle Full Screen View icon): You can click this icon to view the current workflow in full screen.
  • Hide Engine Information (Hide Engine Information icon): You can click this icon to show or hide the engine information of each node.
7 Tabs in the right-side navigation pane:
  • Workflow Parameters: You can click this tab and assign a value to a variable in the code for all ODPS SQL nodes in the current workflow at a time.
  • Change History: You can click this tab and view the operation records of nodes in the current workflow.
  • Versions: Each time nodes in the workflow are committed, a new version is generated for the workflow. You can click this tab and view all versions and version details of the workflow.
8 Tools in the toolbar and tools above the configuration tab:
  • Submit (Submit icon): You can click this icon to commit one or more updated nodes in the current workflow to the Deploy page.
  • Run (Run icon): You can click this icon to run all nodes in the current workflow.
  • Stop (Stop icon): If a workflow is running, you can click this icon to stop the nodes from running in the workflow.
  • Deploy (Deploy icon): You can click this icon to go to the Deploy page and view the nodes to be deployed in the current workflow. Then, you can deploy nodes based on your business requirements.
  • Go to Operation Center (Go to Operation Center icon): You can click this icon to go to Operation Center to view the O&M details of nodes.
  • View opened configuration tabs: If you have opened multiple configuration tabs on the DataStudio page, you can click the Downward arrow icon icon to view all configuration tabs that are open from the drop-down list.
  • Close opened configuration tabs: You can click the More icon icon to close one or more configuration tabs.

Shortcut menu related to workflows

Move the pointer over a workflow and right-click the workflow. The following figure shows the shortcut menu that appears, and the following table describes the commands supported by the shortcut menu. Workflow-related shortcut menu
Command Description
Create Node This command allows you to quickly create nodes of different types.
When you create a node, the system displays the node types that are recently used. If you click one of the node types, the system automatically sets the Engine Instance and Node Type parameters based on the information about the node that was last used of this type. You can create a node of a type that was recently used by using this method. Create Node
Create Table This command allows you to quickly create tables of different types.
Create Resource This command allows you to quickly create resources of different engine types.
Note This command supports only MaxCompute, CDH, and EMR resources.
Create Solution This command allows you to quickly create functions of different engine types.
Note This command supports only MaxCompute, CDH, and EMR functions.
Board This command navigates you to the canvas of a workflow.
Modify Workflow This command allows you to modify the name, owner, and description of a workflow.
Delete Workflow This command allows you to delete the current workflow.
Note If you perform this operation, all objects in the workflow will be deleted. Proceed with caution.
The following options are available to cope with situations where an object cannot be deleted:
  • Terminate the Delete Operation: By default, this option is selected. If an object cannot be deleted, the delete operation will be terminated. This operation does not affect the deleted objects.
  • Skip Current Object and Continue to Delete Other objects: If an object cannot be deleted, the system skips the object and continues to delete other objects.
Delete Workflow
Perform operations on multiple DataWorks objects at a time This command allows you to modify the configurations of multiple nodes, resources, or functions at a time. For example, you can modify the owners, engine instances, and scheduling properties of multiple objects at a time. This command also allows you to commit and deploy multiple modified objects to the production environment at a time.

Features related to nodes

After you create a workflow, you can create different types of nodes for data development based on your requirements. For more information, see the "Select a data development node" section in the Overview topic. Different types of nodes provide similar features. In this section, an ODPS SQL node is used as an example to describe the features that are provided by DataWorks on the node configuration tab. Node-related features
Section Description
1 Node development-related features in the top toolbar:
  • Save (Save icon): You can click this icon to save the code and configurations of the current node.
  • Save as Ad-Hoc Query Node (Save as Ad-Hoc Query Node icon): You can click this icon to save the current code as an ad hoc query. Then, you can view the ad hoc query on the Ad Hoc Query tab. For more information, see Create an ad hoc query.
  • Submit (Submit icon): You can click this icon to commit the current node.
  • Unlock (Unlock icon): You can click this icon to commit the current node and allow other users to modify the code of the node.
  • Steal Lock (Steal Lock icon): If you are not the owner of the node but you want to modify the node, click this icon.
  • Run (Run icon): You can click this icon to run the code of the current node. Values need to be assigned to the variables in the SQL statements only once. If you modify the code, the variables in the code still use the initial values that you assign.
    Note If no resource group for scheduling is specified for the node, DataWorks prompts you to select a resource group for scheduling after you click this icon.
  • Run with Parameters (Run with Parameters icon): You can click this icon to run the code of the current node based on the configured parameters. Each time you click the Run with Parameters icon, you must assign values to variables in SQL statements. DataWorks automatically obtains the initial values when you click the icon. After you assign values to custom parameters, DataWorks replaces the initial values with the values assigned to the custom parameters.
    Note If no resource group for scheduling is specified for the node, DataWorks prompts you to select a resource group for scheduling after you click this icon.
  • Stop (Stop icon): You can click this icon to stop the node that is running.
  • Reload (Reload icon): You can click this icon to refresh the current node configuration tab and return to the node configuration tab that is last saved.
  • Perform Smoke Testing in Development Environment (Perform Smoke Testing in Development Environment icon): You can click this icon to test the code of the current node in the development environment. Smoke testing in the development environment allows you to simulate the value replacement of scheduling parameters in the production environment. After you select a data timestamp, DataWorks replaces the values in the specified data timestamp with the values that you specified. This feature checks the result of value replacement for scheduling parameters.
    Note Each time you modify the scheduling parameters, you must save and commit the modification before you perform smoke testing in the development environment. Otherwise, the new values of scheduling parameters do not take effect.
  • View Log of Smoke Testing in Development Environment (View Log of Smoke Testing in Development Environment icon): You can click this icon to view the operation logs of a node that runs in the development environment.
  • Access Scheduling System in Development Environment (Access Scheduling System in Development Environment icon): You can click this icon to go to Operation Center in the development environment and perform O&M operations. For more information, see View auto triggered node instances.
  • Format Code (Format Code icon): You can click this icon to sort the code of the current node. This prevents the code in a single line from being excessively long.
  • Share (Share icon): You can click this icon to share the current node with other users.
2
Properties tab:
  • General: In this section, you can view the name, ID, and type of the node and set the Owner and Description parameters for the node.
  • Parameters: In this section, you can add scheduling parameters for the node and dynamically assign values to the parameters.
  • Schedule: In this section, you can configure time properties for scheduling the node after the node is deployed in the production environment. The time properties include the instance generation mode, recurrence and scheduled time of instances, rerun properties, and timeout period for the node.
  • Resource Group: In this section, you can specify a resource group for scheduling for the node.
  • Dependencies: In this section, you can configure node dependencies. For more information, see Configure same-cycle scheduling dependencies and Configure previous-cycle scheduling dependencies.
  • Input and Output Parameters: In this section, you can use context-based parameters to pass the value of the output parameter of an ancestor node to a descendant node based on the assignment feature.

Lineage tab: This tab displays the dependencies and auto-captured lineage between the current node and other nodes.

Versions tab: A version is generated each time a node is committed and deployed. On this tab, you can view the historical versions and information about each version of the node. The information includes the user that committed the node, the time when the node was committed, change type, status, and remarks. The following content describes the different states of a node version:
  • Yes: The node has been committed to the development environment, but a deployment task has not been created for the node on the Create Deploy Task page.
  • Deployed: The node has been deployed in the production environment. You can view the node on the Cycle Task page in Operation Center in the production environment. For more information, see View and manage auto triggered nodes.
  • Not Deployed: The node has been committed to the development environment but not deployed to the production environment. If you commit the node again, the previously committed version becomes a pending version.
  • The deployment is cancelled: If you commit a node but cancel the deployment of the committed node on the Create Deploy Task page, the state of this committed version becomes The deployment is cancelled.

Code Structure tab: This tab uses SQL operators to display the code structure of the node.

3 SQL Editor: You can write SQL statements in the editor based on your business requirements.
  • You can click the Top icon icon to return to the first line of the SQL editor.
  • You can click the Toggle Full Screen View icon icon to view the SQL editor in full screen.
  • You can click the Run icon icon to quickly run a code snippet to test whether the code snippet is correctly written. For more information, see Debug a code snippet: Quickly run a code snippet.
    Note This icon is displayed only when you click a line of code.
4 Features in the upper-right corner:
  • Deploy: You can click Deploy to go to the Create Deploy Task page. On this page, you can view the deployment details of the node or perform O&M operations in the production environment after the node is deployed.
  • Operation Center: You can click this icon to go to Operation Center in the production environment and perform O&M operations.

Shortcut menu related to nodes

Move the pointer over a node and right-click the node. The following figure shows the shortcut menu that appears, and the following table describes the commands supported by the shortcut menu. Node-related shortcut menu
Command Description
Rename This command allows you to change the name of the node.
Add to Favorites This command allows you to add the node to favorites. After you add the node to favorites, you can click My Favorites in the upper-right corner of the Scheduled Workflow tab to view the node. If you want to remove the node from favorites, right-click the node and select Remove from Favorites.
Move This command allows you to move the node to another workflow.
Clone This command allows you to clone the node. The new node is of the same type and has the same owner and resource properties as the original node but has a name that is different from that of the original node.
View Versions This command allows you to view the historical versions and information about each version of the node. The information includes the user that committed the node, the time when the node was committed, change type, status, and remarks.
Locate in Operation Center This command navigates you to Operation Center so that you can view information about the node. If the node is committed to both the development and production environments, you can select Locate in Operation Center (Production Environment) or Locate in Operation Center (Development Environment).
Submit for Code Review This command commits the code of the node for review. A node that is committed by a developer must pass the code review before it can be deployed.
Delete This command deletes the node and the dependency configurations of its ancestor and descendant nodes. After you click Delete to delete a node that has been deployed to the production environment, you must go to the Create Deploy Task page, create a deployment task for the node, and then deploy the node. This way, the node is deleted from the production environment. For more information, see Delete a node.