This tutorial walks you through creating a periodically scheduled node in DataStudio from scratch. By the end, you'll know how to:
-
Create a workflow and tables in a MaxCompute compute engine
-
Write and configure an ODPS SQL node
-
Set up scheduling parameters and dependencies
-
Debug, commit, smoke test, and deploy the node to production
The example uploads a local file to a source table, then uses a compute node to cleanse and process the data into a result table.
Prerequisites
Before you begin, make sure you have:
-
Completed the environment setup described in Data development: Developers
-
Added a MaxCompute data source to your workspace (required to create an ODPS SQL node)
-
An Alibaba Cloud account or a RAM user assigned the Workspace Administrator or Develop role
How it works
DataWorks organizes development work into workflows, where each node represents a unit of computation. DataStudio provides a visualized development interface for nodes of various compute engines, such as MaxCompute, Hologres, E-MapReduce (EMR), and CDH. For more information, see Overview.
The typical data flow is:
-
A data synchronization node ingests raw business data into a source table.
-
A compute node cleanses and transforms the source data, then writes results to a result table.
In this tutorial, you skip the synchronization node by uploading data directly from your local machine. You then create a compute node to process the data.
Step 1: Create a workflow
Workflows are the organizational unit in DataStudio. All nodes live inside a workflow, so create one before doing any development.
-
Log on to the DataWorks console. In the top navigation bar, select your region. In the left-side navigation pane, choose Data Development and Governance > Data Development. Select your workspace and click Go to Data Development.
-
In the Scheduled Workflow pane, create a workflow using one of these methods:
-
Move the pointer over the
icon and click Create Workflow. -
Right-click Business Flow and select Create Workflow.
-
-
In the Create Workflow dialog box, enter a workflow name and description, then click Create. In this example, the workflow name is
Create the first auto triggered node.
For more information, see Create a workflow.
Step 2: Create tables
Before writing any node code, create the tables that will hold your raw and processed data. This example uses two MaxCompute tables.
Create and define the tables
-
In the Scheduled Workflow pane, click Business Flow, find your workflow, right-click MaxCompute, and select Create Table.
-
In the Create Table dialog box, set the Engine Instance and Name parameters. Create the following two tables:
Table name Description bank_dataStores raw business data result_tableStores the cleansed and processed data -
On the configuration tab of each table, switch to DDL mode and paste the DDL statement to generate the table schema. Set the Display Name parameter in the General section. DDL for
bank_data:CREATE TABLE IF NOT EXISTS bank_data ( age BIGINT COMMENT 'Age', job STRING COMMENT 'Job type', marital STRING COMMENT 'Marital status', education STRING COMMENT 'Education level', default STRING COMMENT 'Credit card', housing STRING COMMENT 'Mortgage', loan STRING COMMENT 'Loan', contact STRING COMMENT 'Contact information', month STRING COMMENT 'Month', day_of_week STRING COMMENT 'Day of the week', duration STRING COMMENT 'Duration', campaign BIGINT COMMENT 'Number of contacts during the campaign', pdays DOUBLE COMMENT 'Interval from the last contact', previous DOUBLE COMMENT 'Number of contacts with the customer', poutcome STRING COMMENT 'Result of the previous marketing campaign', emp_var_rate DOUBLE COMMENT 'Employment change rate', cons_price_idx DOUBLE COMMENT 'Consumer price index', cons_conf_idx DOUBLE COMMENT 'Consumer confidence index', euribor3m DOUBLE COMMENT 'Euro deposit rate', nr_employed DOUBLE COMMENT 'Number of employees', y BIGINT COMMENT 'Time deposit available or not' );DDL for
result_table:CREATE TABLE IF NOT EXISTS result_table ( education STRING COMMENT 'Education level', num BIGINT COMMENT 'Number of persons' ) PARTITIONED BY ( day STRING, hour STRING ); -
In the top toolbar, click Commit to Development Environment, then click Commit to Production Environment.

Table creation and updates take effect in the compute engine only after they are committed to the target environment. For more information, see Table creation statements and Create tables.
Upload data
Upload the sample file banking.txt from your local machine to the bank_data table.
For detailed steps, see Upload a file from your on-premises machine to the bank_data table.
Step 3: Create a node
Nodes are the building blocks of data development in DataWorks. Choose the node type based on the compute engine you want to use.
Create a node using one of these methods:
-
From the Scheduled Workflow pane: Right-click the compute engine, move the pointer over Create Node, and select the node type.
-
From the workflow canvas: Double-click the workflow name to open the canvas, then click or drag the required node type from the left-side section.
In the Create Node dialog box, set the Engine Instance and Name parameters.
This example creates an ODPS SQL node named result_table — the same name as the result table. Naming the node after its output table makes it easy to trace which node produced a given table.
Step 4: Configure the node
Open the node by double-clicking its name. Write the node code using the syntax of the target compute engine.
This example reads data from a specified partition in bank_data and writes the cleansed results to a partition in result_table. The partition is defined by the day and hour variables.

INSERT OVERWRITE TABLE result_table partition (day='${day}', hour='${hour}')
SELECT education
, COUNT(marital) AS num
FROM bank_data
GROUP BY education;
To use variables in scheduling scenarios, define them in the code as ${variable_name} and assign values in step 5.
For supported scheduling parameter formats, see Supported formats of scheduling parameters. For node code syntax, see Create and use nodes.
Step 5: Configure scheduling properties
Scheduling properties control when and how often DataWorks runs your node. Click the Properties tab in the right-side pane of the node configuration tab.
Configure rerun settings and ancestor nodes before committing the node in step 7.
| Section | What to configure |
|---|---|
| General | Automatically populated with node name, ID, type, and owner. Modify the owner if needed — only workspace members can be set as owner. Note: the node ID is automatically generated after the node is committed. |
| Scheduling parameter | Assign values to the variables defined in step 4. In this example, assign ${yyyymmdd} to day and $[hh24] to hour. This writes each hour's data from bank_data into the corresponding hourly partition in result_table. |
| Schedule | Set the scheduling cycle, start time, rerun settings, and timeout. In this example, the node runs every hour starting at 00:00. |
| Resource group | Select the resource group for scheduling. By default, a serverless resource group is provided when you activate DataWorks. See Create and use a serverless resource group. |
| Dependencies | Configure the ancestor nodes that must complete before this node runs. If the node queries data generated by other nodes, configure the ancestor node using one of these methods: (a) If the ancestor node is outside the current workflow, enter the output name of the ancestor node in the Parent Nodes table. (b) If the ancestor node is inside the current workflow, configure the dependency by drawing lines on the workflow canvas. In this example, the result_table node reads from bank_data, which is not generated by another node in the workflow. Set the workspace root node as the ancestor node. |
| (Optional) Input and output parameters | Configure parameters passed between nodes. Required only when using assignment nodes. |
Step 6: Debug the node
Before committing, verify that your code runs correctly. The recommended path for this tutorial is Run with Parameters, which lets you assign test values to the variables defined in step 4.
-
In the top toolbar, click the Run with Parameters icon.
-
In the dialog box, assign constant values to the variables defined in step 4.
-
Review the output to confirm the results are correct.
In this example, the node is run at 2022.09.07 14:00 as the test timestamp.
If you need a different debugging approach:
| Debug feature | Best for |
|---|---|
| Quick run | Running a selected code snippet quickly |
| Run | Full-code debugging with saved variable assignments (saved after first run) |
| Run with Parameters | Full-code debugging when you need to change variable values each time |
Step 7: Save and commit the node
After debugging, save the node and commit it to the development environment.
Before committing, confirm that you have configured rerun settings and ancestor nodes in step 5.
-
Click the
icon in the top toolbar to save the node. -
Click the
icon to commit the node to the development environment.
Step 8: Perform smoke testing
Smoke testing validates that the scheduling parameters are configured correctly before you deploy to production. Run it in the development environment after committing the node.
-
Click the
icon and specify the data timestamp for the test. -
After the test completes, click the
icon to view the results.
In this example, the result_table node runs hourly from 00:00 to 23:59. Smoke testing generates two instances with scheduling times of 00:00 and 01:00.
Auto triggered instances are snapshots generated for a node each time it is scheduled. For hourly nodes, specify both the start and end timestamps when running the smoke test. For more information, see Perform smoke testing.
Step 9: Deploy the node
DataWorks only schedules nodes that are deployed to the production environment.
-
Basic mode workspaces: The node is periodically scheduled as soon as it is committed.
-
Standard mode workspaces: Committed changes enter a pending state. Click Deploy to open the Create Deploy Task page and push changes to production.
Click Deploy, review the pending operations (additions, updates, and deletions), and confirm the deployment. For detailed steps, see Deploy nodes.
| Deployment detail | Description |
|---|---|
| Deployment control | Developers can create deployment packages. Deploying them requires O&M permissions. Check the deployment status on the Deployment Packages page. |
| Instance generation timing | If you deploy between 23:30 and 24:00, instances take effect on the third day. This applies to nodes with the instance generation mode set to Next Day or Immediately After Deployment. See Configure immediate instance generation for a task. |
For differences between basic mode and standard mode workspaces, see Differences between workspaces in basic mode and workspaces in standard mode.
What's next
Go to Operation Center and open the Auto Triggered Tasks page to view your deployed node and perform O&M operations. For more information, see Perform basic O&M operations on auto triggered nodes.