All Products
Search
Document Center

DataWorks:Assignment node

Last Updated:Mar 26, 2026

In complex data workflows, passing dynamic information between nodes—such as partition lists or business identifiers—is a common requirement. Using an intermediate table for this is inefficient: it adds unnecessary I/O overhead and storage complexity for small amounts of data. The assignment node solves this with a lightweight approach: it runs a short script (MaxCompute SQL, Python 2, or Shell), captures the last output, and passes it directly to downstream nodes as a parameter. This lets you build dynamic pipelines where tasks are configured based on results from upstream tasks.

Prerequisites

Before you begin, ensure that you have:

  • DataWorks Standard Edition or higher

  • The Development or Workspace Manager role in your DataWorks workspace. For details, see Add members to a workspace

How it works

The assignment node captures the last output of its script and assigns it to a system-generated output parameter named outputs. A downstream node reads this value by declaring a matching input parameter—and DataWorks automatically establishes the scheduling dependency between the two nodes when you do this. You get parameter passing and dependency wiring in a single configuration step.

image

Parameter format

The format of outputs depends on the script language:

Language

Value source

Format

MaxCompute SQL

Last SELECT statement

Two-dimensional array

Python 2

Last print statement

Single string split by commas (,) into a one-dimensional array. Example: 'Electronics,Clothing,Books' becomes ['Electronics','Clothing','Books']. Escape commas in element values as \,.

Shell

Last echo statement

(not explicitly specified)

Pass parameters between nodes

The following example shows how to pass data from an assignment node to a Shell node.

Step 1: Configure the assignment node

In your workflow, create an assignment node. Select MaxCompute SQL, Python 2, or Shell, then write the script.

image

Step 2: Configure the downstream node

The following steps use a Shell node as the downstream node.

image

  1. Create a Shell node.

  2. In the Scheduling panel on the right, select the Input and Output Parameters tab.

  3. In the Input Parameters section, click Create Parameter.

  4. Set the parameter name (for example, param) and set the value to the outputs parameter from the assignment node.

    DataWorks automatically creates a scheduling dependency between the Shell node and the assignment node after this configuration.
  5. Reference the passed value in your Shell script using ${param}.

Step 3: Deploy and verify

  1. On the workflow canvas, click Deploy in the toolbar and select Full Deployment.

  2. Go to Operation Center > Auto Triggered Node O&M > Auto Triggered Nodes.

  3. Run a smoke test on the workflow and verify the results.

Limitations

  • Immediate downstream only: Parameters can only be passed to direct downstream nodes, not across multiple levels. To pass parameters across levels, use a parameter node instead.

  • Size limit: outputs cannot exceed 2 MB, otherwise the assignment node will fail.

  • No comments in script code: Comments disrupt output parsing and cause the node to fail or return incorrect values.

  • No `WITH` clause: The WITH clause is not supported in MaxCompute SQL mode.

Examples by language

Pass a MaxCompute SQL query result

The result of a SQL query is passed to downstream nodes as a two-dimensional array.

Assignment node

The following SQL returns two rows and two columns:

SELECT 'beijing', '1001'
UNION ALL
SELECT 'hangzhou', '1002';

Shell node

Add an input parameter named param (or any name you choose) that references the assignment node's outputs. DataWorks performs static replacement at runtime.

Script

Output

echo "Full result set: ${region}"

Full result set: beijing,1001

echo "First row: ${region[0]}"

First row: beijing,1001

echo "Second field of first row: ${region[0][1]}"

Second field of first row: 1001

Pass a Python 2 output

The last print statement is split by commas into a one-dimensional array.

Assignment node

print 'Electronics, Clothing, Books';

Shell node

Add an input parameter named param that references outputs. DataWorks performs static replacement at runtime.

Script

Output

echo "Full result set: ${types}"

Full result set: Electronics,Clothing,Books

echo "Second element: ${types[1]}"

Second element: Clothing

Shell nodes follow the same processing logic as Python 2 nodes.

Use case: Batch process data from partitioned tables

A common pattern combines an assignment node with a for-each node: the assignment node outputs a list, and the for-each node iterates over each element. This keeps processing logic in one place and makes it easy to extend to new items.

The following example shows how to use this pattern to process user behavioral data from multiple lines of business.

image

Background

You are a data developer responsible for processing user behavioral data from three lines of business: e-commerce, finance, and logistics. The same aggregation logic (calculating daily page views per user) runs every day, and more lines of business may be added in the future.

Source tables (DWD layer):

  • dwd_user_behavior_ecom_d: E-commerce user behavior, partitioned by day (dt)

  • dwd_user_behavior_finance_d: Finance user behavior, partitioned by day (dt)

  • dwd_user_behavior_logistics_d: Logistics user behavior, partitioned by day (dt)

  • dwd_user_behavior_${line-of-business}_d: User behavior tables for other potential lines of business

These tables have the same schema and are partitioned by day (dt).

Destination table (DWS layer):

  • dws_user_summary_d: Aggregate table, partitioned by biz_line and dt

Creating a separate task for each line of business is hard to maintain. By combining an assignment node and a for-each node, you maintain one set of processing logic that runs automatically for each line of business.

Data preparation

  1. Associate a MaxCompute compute resource with the workspace.

  2. Go to Data Studio and create a MaxCompute SQL node.

  3. Create the source tables: add the following code to the node, select it, and run it.

    -- E-commerce user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_ecom_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Millisecond-level UNIX timestamp of the event occurrence'
    )
    COMMENT 'Details of e-commerce user behavioral logs'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    
    INSERT OVERWRITE TABLE dwd_user_behavior_ecom_d PARTITION (dt='20251010') VALUES
    ('user001', 'click',        1760004060000), -- 2025-10-10 10:01:00.000
    ('user002', 'browse',       1760004150000), -- 2025-10-10 10:02:30.000
    ('user001', 'add_to_cart',  1760004300000); -- 2025-10-10 10:05:00.000
    -- Verify that the e-commerce user behavior table is created.
    SELECT * FROM dwd_user_behavior_ecom_d where dt='20251010';
    
    -- Finance user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_finance_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Millisecond-level UNIX timestamp of the event occurrence'
    )
    COMMENT 'Details of finance user behavioral logs'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    
    INSERT OVERWRITE TABLE dwd_user_behavior_finance_d PARTITION (dt='20251010') VALUES
    ('user003', 'open_app',      1760020200000), -- 2025-10-10 14:30:00.000
    ('user003', 'transfer',      1760020215000), -- 2025-10-10 14:30:15.000
    ('user003', 'check_balance', 1760020245000), -- 2025-10-10 14:30:45.000
    ('user004', 'open_app',      1760020300000); -- 2025-10-10 14:31:40.000
    -- Verify that the finance user behavior table is created.
    SELECT * FROM dwd_user_behavior_finance_d where dt='20251010';
    
    -- Logistics user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_logistics_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Millisecond-level UNIX timestamp of the event occurrence'
    )
    COMMENT 'Details of logistics user behavioral logs'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    
    INSERT OVERWRITE TABLE dwd_user_behavior_logistics_d PARTITION (dt='20251010') VALUES
    ('user001', 'check_status',    1760032800000), -- 2025-10-10 18:00:00.000
    ('user005', 'schedule_pickup', 1760032920000); -- 2025-10-10 18:02:00.000
    
    -- Verify that the logistics user behavior table is created.
    SELECT * FROM dwd_user_behavior_logistics_d where dt='20251010';
  4. Create the destination table: add the following code to the node, select it, and run it.

    Important

    If the workspace uses standard mode, deploy this node to the production environment and perform a data backfill.

    CREATE TABLE IF NOT EXISTS dws_user_summary_d (
        user_id     STRING COMMENT 'User ID',
        pv          BIGINT COMMENT 'Daily popularity'
    )
    COMMENT 'Daily user popularity aggregate table'
    PARTITIONED BY (
        dt           STRING COMMENT 'Date partition in yyyymmdd format',
        biz_line     STRING COMMENT 'Line-of-business partition, such as ecom, finance, logistics'
    );

Workflow implementation

  1. Create a workflow. In the Scheduling pane on the right, set the scheduling parameter bizdate to the previous day: $[yyyymmdd-1].

    image

  2. In the workflow, create an assignment node named get_biz_list. Write the following MaxCompute SQL code. This node outputs the list of lines of business to process.

    -- Output all lines of business to be processed
    SELECT 'ecom' AS biz_line
    UNION ALL
    SELECT 'finance' AS biz_line
    UNION ALL
    SELECT 'logistics' AS biz_line;
  3. Configure the for-each node.

    • Return to the workflow canvas and create a for-each node downstream of the assignment node.

    • Go to the for-each node settings. On the Scheduling tab, under Scheduling Parameters > Script Parameters, set loopDataArray to the outputs of the get_biz_list node. image

    • In the for-each node loop body, click Create Internal Node. Create a MaxCompute SQL node and write the following processing logic.

      Note

      This script runs once for each line of business. The built-in variable ${dag.foreach.current} is replaced with the current line-of-business name at runtime (ecom, finance, or logistics).

      SET odps.sql.allow.dynamic.partition=true;
      
      INSERT OVERWRITE TABLE dws_user_summary_d PARTITION (dt='${bizdate}', biz_line)
      SELECT
          user_id,
          COUNT(*) AS pv,
          '${dag.foreach.current}' AS biz_line
      FROM
          dwd_user_behavior_${dag.foreach.current}_d
      WHERE
          dt = '${bizdate}'
      GROUP BY
          user_id;
  4. Add a verification node. Return to the workflow canvas. Create a downstream MaxCompute SQL node for the for-each node and add the following code.

    SELECT * FROM dws_user_summary_d WHERE dt='20251010' ORDER BY biz_line, user_id;

Deploy and run

Deploy the workflow to the production environment. In Operation Center, go to Auto Triggered Node O&M > Auto Triggered Nodes, find the workflow, run a test, and select '20251010' as the data timestamp.

After the run completes, view the run log in the test instance. The expected output of the final node is:

user_id

pv

dt

biz_line

user001

2

20251010

ecom

user002

1

20251010

ecom

user003

3

20251010

finance

user004

1

20251010

finance

user001

1

20251010

logistics

user005

1

20251010

logistics

Advantages

  • High extensibility: To add a new line of business, add one line of SQL in the assignment node. No changes to the processing logic are needed.

  • Easy maintenance: All lines of business share the same processing logic—a change in one place applies to all.

FAQ

Why do I get "find no select sql in sql assignment!" in MaxCompute SQL?

The MaxCompute SQL script is missing a SELECT statement. Add one. Note that the WITH clause is not supported; using it also triggers this error.

Why do I get "OutPut Result is null, cannot handle!" in Shell or Python?

The script has no output. Check whether your code contains a print (Python 2) or echo (Shell) statement.

How do I handle output elements that contain commas in Shell or Python?

Escape commas in element values by replacing , with \,. The following Python 2 example shows how:

categories = ["Electronics", "Clothing, Shoes & Accessories"]

# Escape commas in each element
escaped_categories = [cat.replace(",", "\,") for cat in categories]

# Join with commas
output_string = ",".join(escaped_categories)
print output_string
# Final output: Electronics,Clothing\, Shoes & Accessories

Can a downstream node receive results from multiple assignment nodes?

Yes. Assign the results from different assignment nodes to different input parameters on the downstream node.

image

Does the assignment node support other languages?

The assignment node supports MaxCompute SQL, Python 2, and Shell. Some node types—including EMR Hive, Hologres SQL, EMR Spark SQL, AnalyticDB for PostgreSQL, ClickHouse SQL, and MySQL—have a built-in parameter assignment feature that provides the same functionality.

image

What's next