Develop and schedule Doris tasks using a Doris node - DataWorks

Use the DataWorks Doris node to write, schedule, and orchestrate Doris SQL tasks as part of a recurring pipeline.

Background information

Apache Doris is a high-performance, real-time AnalyticDB. It is well-suited for scenarios such as report analysis, ad hoc queries, and data lake federated query acceleration. For more information, see Doris Introduction.

How it works

A Doris node is the basic unit of work in DataStudio. Each node holds the SQL code for one Doris task and carries its own scheduling configuration — when to run, which upstream nodes must complete first, and how to handle reruns. Once published, the node runs automatically on the defined schedule, and you monitor its execution status in Operation Center.

A node lives inside a Business Flow, which is DataStudio's way of grouping related nodes into a single pipeline. Create the Business Flow first, then add nodes to it.

Prerequisites

Before you begin, ensure that you have:

A Business Flow. For more information, see Create a workflow.
A Doris data source created with a JDBC connection string. For setup instructions, see Data Source Management and Doris data source.
Network connectivity between the data source and the resource group you plan to use. For configuration options, see Network connection solutions.
(RAM users only) The Develop or Workspace Administrator role in the workspace. Grant the Workspace Administrator role with caution — it carries broad privileges. For more information, see Add members to a workspace.

Limitations

Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Germany (Frankfurt), US (Silicon Valley), and US (Virginia).

Step 1: Create a Doris node

Go to the DataWorks console at dataworks.console.aliyun.com. In the top navigation bar, select the target region.
In the left-side navigation pane, choose Data Development and O\&M > Data Development. Select your workspace from the drop-down list and click Go to Data Development.
In DataStudio, right-click the target Business Flow and choose Create Node > Database > Doris.
In the Create Node dialog box, enter a Name and click OK.

Step 2: Write the SQL code

Select a data source

If the workspace has multiple Doris data sources, select one from the drop-down list on the node configuration tab. If only one Doris data source exists, it is selected by default.

Note

Doris nodes support only Doris data sources created with a JDBC connection string.

Write SQL

Enter your SQL directly in the code editor. The following is a basic example:

SELECT * FROM usertablename;

Use scheduling parameters for dynamic values

DataWorks scheduling parameters let you pass different values into a query at each run — for example, substituting yesterday's date or a changing filter value without editing the code. Define variables using the ${variable_name} format, then assign values in Scheduling Configuration > Scheduling Parameters in the right navigation pane.

SELECT '${var}'; -- var is assigned in Scheduling Configuration > Scheduling Parameters

For the full list of supported formats, see Supported formats of scheduling parameters. For assignment examples, see Configure and use scheduling parameters.

Step 3: Configure scheduling

Click Scheduling Configuration in the right navigation pane and set the properties for this node.

Important

Configure Rerun Property and Upstream Dependent Node before submitting.

For a full reference of scheduling properties, see Overview.

Step 4: Debug the task

Run the task in the development environment to verify execution before publishing it to production.

(Optional) Click the icon in the toolbar to open the Parameters dialog box. Select a resource group and assign values to any scheduling parameters used in the SQL. For details on how parameter values are resolved during debugging, see Task debugging process.
Click the icon to save, then click the icon to run.
(Optional) Run a smoke test during or after submission to verify execution in the development environment. For more information, see Perform smoke testing.

Step 5: Submit and publish the task

Click the icon to save the node.
Click the icon to open the Submit dialog box. Enter a Change Description and select your code review options.
Note
If code review is enabled, a reviewer must approve the code before it can be published. For more information, see Code review.
(Standard mode workspaces only) Click Publish in the upper-right corner to deploy the node to production. For more information, see Publish tasks.

What's next

After publishing, the node runs on the configured schedule. Click O\&M in the upper-right corner to open Operation Center, where you can view the scheduling and running status of each recurring task instance. For more information, see Manage recurring tasks.