This topic describes the basic development process in the Data Studio (new version).
Prerequisites
A DataWorks workspace is created, and Data Studio is enabled.
This guide applies to the Data Studio. To use it, ensure that Data Studio is enabled for your workspace. You can enable Data Studio in the following ways:
When you create a workspace, select Use Data Studio (New Version).
You can upgrade an existing workspace from DataStudio (legacy version) to Data Studio. On the DataStudio page, click the Upgrade button at the top and follow the on-screen instructions to complete the process.
After February 18, 2025, when an Alibaba Cloud account enables DataWorks and creates a workspace for the first time in the following regions, the Data Studio (new version) is enabled by default.
China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Thailand (Bangkok), Germany (Frankfurt), UK (London), US (Silicon Valley), US (Virginia)
A computing resource is bound to the DataWorks workspace. You can select a computing resource based on your needs. For more information, see Bind a computing resource.
Enter Data Studio
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Directory planning
Data Studio supports data development in different directories. The differences between the directories are described below.
Directory type | Permission scope | Features | Use cases |
Personal Directory | Personal account level |
| Personal development and testing |
Workspace Directories | Workspace level |
| Production tasks that require recurring scheduling |
Manually triggered workflow/task directory | Workspace level |
| Temporary, manually run tasks |
Data Studio
After you understand the differences between the directories for various use cases from the directory planning, you can create a development directory based on your business needs.
Personal directory (for personal testing, temporary queries, and cross-project code synchronization)
Files in the Personal Folder are visible only to the current account. You can use these files for personal testing or temporary queries, but you cannot configure scheduling for them or publish them to the production environment. The Personal Folder is visible across all your workspaces and supports cross-workspace synchronization. To schedule and publish a file, you must first submit it from the Personal Folder to a Project Folder. You can then configure scheduling and publish the file from that Project Folder. For more information, see Personal Folder.
In the navigation pane on the left, click
to go to the Data Development folder.In the Personal Directory section, you can click
to create a folder, and then create files in it as needed.To submit a file from the Personal Folder to the Project Folder in the workspace, click Commit To Workspace Directory at the top of the editing window. For the next steps, see Workspace directories (for production environments).
Workspace directories (for production environments)
Files in the Project Folder support collaborative team development. You can create different types of nodes and orchestrate their upstream and downstream dependencies. For more information, see Project Folder.
In the navigation pane on the left, click
to go to the Data Development folder.Create a project directory, nodes, and workflows.
In the Workspace Directories section, you can click
to create a folder, node, or workflow.Directory: You can use directories to manage nodes and workflows.
Node: Data Studio supports a wide range of node types, such as Data Integration, Notebook, and MaxCompute SQL. For more information about the functions and differences of various nodes, see Node development.
Workflow: A workflow is a tool for automating the management of data processing. It provides a visual canvas that lets you integrate various types of subtask nodes by dragging and dropping them. This makes it easy to establish dependencies between tasks, accelerate the creation of data processing flows, and improve development efficiency. For more information, see Auto triggered workflows.
Node orchestration.
Node: For a standalone node, you must configure its upstream and downstream dependencies in the scheduling dependency settings.
On the node editing page, click Scheduling in the right pane. Configure the Node Scheduling parameters to define the upstream and downstream dependencies for the node. Dependencies ensure that nodes run in the correct order. A descendant node runs only after its ancestor nodes run successfully. This ensures that the current node can retrieve the correct data at the right time.
Workflow: A workflow lets you visually orchestrate the upstream and downstream dependencies of nodes on a canvas. You can plan the orchestration as required.
Node development.
Data Studio supports a wide range of node types. The configurable content varies by node type. For more information, see Node development to complete the node configuration.
NoteDuring node development, you can define variables using the
${Variable_Name}format. You can then assign constant values to the variables during the testing phase and dynamically assign values during scheduling configuration.
Manually triggered workflow/task directory
You can create manually triggered workflows or tasks for scenarios that do not require recurring scheduling.
In the navigation pane on the left, click
to go to the Manual Folder.Create development folders and nodes under Manually Triggered Workflows or Manually Triggered Tasks as needed. For more information, see Manually triggered task and Manually Triggered Workflow.
Test
After you finish developing a node, click Debugging Configurations on the right side of the node editing page. Then, click Run in the toolbar to execute the node.
When you configure debug settings, you can set the following parameters:
In Computing Resource, specify the computing resource for submitting the task for debugging.
In Resource Group, specify the resource group for DataWorks task execution.
If you defined variables in your code using the
${Variable_Name}format, you can assign constant values to them in Script Parameters.
Auto triggered workflows do not support debugging the entire workflow. You must debug each inner node individually.
Manually triggered workflows support running the entire workflow.
Scheduling configuration and publishing
Scheduling configuration
After you debug the node, if it needs to be published to the production environment for recurring automatic scheduling, click Scheduling on the right side of the node editing page to configure its scheduling properties.
Scheduling Parameters: Define the parameters used for node scheduling. DataWorks provides multiple assignment formats. If you defined variables using the
${Variable_Name}format during node development, you can use scheduling parameters to dynamically assign values to variables in scheduling scenarios.Scheduling Policies: Define scheduling properties for the node in the scheduling environment, other than the execution frequency and specific execution time.
Scheduling Time: Define the execution frequency and specific execution time for the node in the scheduling environment.
Scheduling Dependencies: Define the upstream and downstream dependencies for the task. Dependencies ensure that nodes run in the correct order. A descendant node starts only after its ancestor nodes run successfully. This process ensures that the current node retrieves data correctly and at the right time.
The scheduling configuration for an auto triggered workflow is different from that of a standalone node. For more information, see Auto triggered workflow.
Node publishing
After you configure the scheduling properties of the node, click the Publish button at the top of the node editing page. The node is then published to the production environment and scheduled to run periodically. For more information, see Publish a node or workflow.
Click the Publish button in the toolbar and then click Start Deployment to Production Environment. This action publishes the task according to the publishing check process.
The publishing operation may fail because it is controlled by enabled checkers. Therefore, you must confirm the final publishing status of the task in the production environment after the publishing process is complete.
Node O&M
After a node is published, an auto triggered task is generated in the production environment of the Operation Center. You can go to the Operation Center to view or adjust the properties and status of the auto triggered task and perform a data backfill for a specific data timestamp.
Quick start
When you open Data Studio, the Welcome page is displayed by default. You can follow the on-screen instructions to try a classic Notebook example or complete the Data Studio introductory tutorial.