This topic describes the basic development process of Data Studio.
Prerequisites
You have created an Alibaba Cloud account or a RAM user. For more information, see Prepare an Alibaba Cloud account and Prepare a RAM user.
You have activated DataWorks. For more information, see Activate DataWorks.
You have created a DataWorks workspace and enabled Data Studio.
This guide applies to Data Studio. Make sure that Data Studio is enabled in your workspace. You can use one of the following methods to enable Data Studio:
When you create a workspace, select Participate in Data Studio public preview.
To upgrade from the previous version to Data Studio, click the Upgrade button at the top of the previous version interface and follow the on-screen instructions to complete the upgrade.
Since February 19, 2025, Data Studio is enabled by default if you activate DataWorks and create a workspace for the first time by using your Alibaba Cloud account in the following regions:
China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Hong Kong), Singapore, Indonesia (Jakarta), Germany (Frankfurt)
You have created a Serverless resource group and bound it to your DataWorks workspace. For more information, see Create and use a Serverless resource group.
You have bound computing resources to your DataWorks workspace. You can select computing resources based on your requirements. For more information, see Bind computing resources (participate in Data Studio public preview).
Access the data studio interface
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
Directory planning
Data Studio allows you to develop data in different directories. The differences between these directories are as follows. You can choose based on your needs.
Directory type | Permission scope | Features | Scenarios |
Personal directory | Personal account level |
| Personal development and testing |
Project directory | Workspace level |
| Production tasks that require recurring scheduling |
Manual directory | Workspace level |
| Temporary manually executed tasks |
Data development
After understanding the differences between directories for different scenarios in the preparation phase, you can create development directories based on your business needs.
Personal directory development (for personal testing, temporary queries, and cross-project code synchronization)
Files in the personal directory are visible only to the current account, cannot be scheduled or published to production, and can be used for personal testing or temporary queries. Files in the personal directory are visible across different workspaces and can be used for cross-workspace synchronization. You can submit files from your personal directory to project directories in different workspaces, and then configure scheduling and publishing in the corresponding workspace's project directory. For more information, see Personal directory.
In the left navigation bar of Data Studio, click
to access the data development directory.
In the Personal Directory section, click
to create a directory, and then create files in the directory as needed.
If you need to submit files from the personal directory to the workspace project directory, you can click Submit To Project Directory at the top of the editing window. For subsequent steps, see Project directory development (for production environment).
Project directory development (for production environment)
Files in the project directory can be developed collaboratively by teams, support creating different types of nodes, and orchestrating upstream and downstream node relationships. For more information, see Project directory.
In the left navigation bar of Data Studio, click
to access the data development directory.
Create project directories, nodes, and workflows.
In the Project Directory section, click
to create directories, nodes, or workflows as needed.
Directory: You can use directories to manage nodes and workflows.
Node: Data Studio supports various node types, such as Data Integration, Notebook, and MaxCompute SQL. For the functions and differences of different nodes, see Node development.
Workflow: A workflow is an automated management tool for data processing processes. It provides a visual canvas that supports integrating various types of task nodes through visual drag-and-drop, easily establishing dependencies between tasks, accelerating the construction of data processing processes, and effectively improving task development efficiency. For more information, see Recurring workflow.
Node orchestration.
Node: For individually created nodes, you need to configure upstream and downstream relationships through node scheduling dependencies.
Click Scheduling Configuration on the right side of the node editing page, and configure Node Scheduling parameters to define the upstream and downstream dependencies of the node. Through dependency relationships, upstream and downstream nodes run in order, meaning that downstream nodes will only start running after upstream nodes run successfully, ensuring that the current node obtains data in a timely and correct manner.
Workflow: Workflows support visually dragging and dropping to orchestrate upstream and downstream relationships between nodes on the canvas, which you can plan yourself.
Node development.
Data Studio supports various node types with different configurable content. You can refer to Node development to complete node configuration.
NoteYou can define variables using the
${variable name}
format during node development, then assign constant values to variables during testing, and dynamically assign values to variables during scheduling configuration.
Manual directory development (for manually triggered node instances)
You can create manually triggered tasks or manually triggered workflows in the manual directory for one-time data processing scenarios that do not require recurring scheduling.
In the left navigation bar of Data Studio, click
to access the manual directory.
Create development directories and nodes under Manually Triggered Task or Manually Triggered Workflow as needed. For more information, see Manually triggered task and Manually triggered workflow.
Testing
After node development is complete, you can click Debug Configuration on the right side of the node editing page, configure debug parameters, and then click Run in the toolbar to run the node code using the debug parameters.
When configuring debugging, you can set the following parameters:
Specify the computing resource for task submission during debugging in Computing Resource.
Specify the resource group used for DataWorks task execution in DataWorks Configuration.
If you have defined variables using the
${variable name}
format in your code, you can assign constant values to the corresponding variables in Script Parameters.
Recurring workflows do not currently support directly debugging the entire workflow. You need to debug each internal node separately.
Manually triggered workflows support running the entire workflow directly.
Scheduling configuration and publishing
Scheduling configuration
After node debugging is complete, if the node needs to be published to the production environment for recurring automatic scheduling, click Scheduling Configuration on the right side of the node editing page to configure scheduling properties for the node.
Scheduling Parameters: Used to define parameters used during node scheduling. DataWorks provides multiple assignment formats. If you have defined variables using the
${variable name}
format during node development, scheduling parameters can implement dynamic assignment of variables in scheduling scenarios.Scheduling Policy: Used to define scheduling attributes other than execution frequency and specific execution time points for nodes in the scheduling environment.
Scheduling Time: Used to define the execution frequency and specific execution time points for nodes in the scheduling environment.
Scheduling Dependencies: Used to define the upstream and downstream dependency relationships of the task. Through dependency relationships, upstream and downstream nodes run in order, meaning that downstream nodes will only start running after upstream nodes run successfully, ensuring that the current node obtains data in a timely and correct manner.
The scheduling configuration for recurring workflows differs from that for individual nodes. For more information, see Recurring workflow.
Node publishing
After node scheduling configuration is complete, you need to click the Publish button at the top of the node editing page to publish the node to the production environment through the publishing process. After that, the node will be scheduled periodically. For more information, see Node/workflow publishing.
Click the Publish button in the toolbar to enter the publishing process, then click Start Publishing To Production, and the task will be published according to the publishing check process.
Because the publishing process is controlled by enabled checkers, the publishing operation may not be successful. Therefore, it is recommended to confirm the final publishing status of the task in the production environment after the publishing process is complete.
Task operations and maintenance
After node publishing is complete, recurring tasks will be generated in the production environment of Operation Center. You can go to Operation Center to view or adjust the properties and status of recurring tasks, and perform data backfill for specified data timestamps.
Quick experience
When you open Data Studio, the Welcome page opens by default at startup. You can follow the page guidance to experience classic Notebook cases or complete the Data Studio getting started tutorial.