All Products
Search
Document Center

DataWorks:Assignment node

Last Updated:Jun 21, 2026

Use an assignment node to pass the query result or output from an upstream node to a downstream node. An assignment node supports MaxCompute SQL, Python 2, and Shell. It automatically assigns the result of the last query or output statement to an output parameter named outputs. Downstream nodes can then reference this parameter to retrieve the result.

Prerequisites

  • Edition requirements: This feature is available only in DataWorks Standard Edition and later.

  • Permission requirements: Your RAM account must be added to the corresponding workspace and granted the Development or Workspace Manager role. For more information, see Add members to a workspace.

Key concepts: Parameter passing and referencing

The core function of an assignment node is parameter passing, which passes data generated by an upstream node to a downstream node.

  • Upstream assignment node: Generates data. It automatically assigns the last output or query result to an output parameter named outputs.

  • Downstream business node: Receives and uses the data. To use the data, add an input parameter, such as param, to the node configuration and configure it to reference the upstream node's outputs parameter. You can then use this parameter in your code to access the data.

Parameter format

The following table describes the format of the value passed by an assignment node.

Language

Value

Format

MaxCompute SQL

The output of the last SELECT statement.

The node passes the output to the downstream node as a two-dimensional array.

Python 2

The output of the last print statement.

The system splits the output string by commas (,) into a one-dimensional array.

For example, the last line of output from the assignment node is 'Electronics,Clothing,Books', which is passed to downstream components in the format ['Electronics','Clothing','Books'].
Important

If the output content contains a comma, you must escape it. For example, if you output 'Electronics,Clothing\, Shoes & Accessories', a downstream service will correctly parse it as ['Electronics', 'Clothing, Shoes & Accessories'].

Shell

The output of the last echo statement.

Procedure

The following example shows how to pass the result of an assignment node to a Shell node. In practice, you can use any node type as the downstream node.

  1. Configure the upstream assignment node

    In your workflow, create and edit an assignment node. Select MaxCompute SQL, Python 2, or Shell, and then write the code that generates the result to pass to the downstream node.

    print '10,20,30,40'
  2. Configure the downstream Shell node

    Create a Shell node and reference the result from the upstream node in the Shell node editor.

    1. In the Scheduling Settings panel on the right, select the Node Context Parameters tab.

    2. In the Input Parameters section, click Add parameters.

    3. In the dialog box that appears, select the outputs parameter of the upstream assignment node and specify a custom Parameter Name for the current node's input parameter, such as param.

      Note

      After you complete the configuration, a dependency is automatically created between the downstream node and the upstream assignment node.

    4. After configuring the parameter, you can use the format ${param} in the code of the downstream Shell node to reference the value from the upstream node.

  3. Run and verify

    1. Return to the workflow canvas, click Deploy in the toolbar, and select Full Deployment.

    2. Go to Node O&M > Auto Triggered Task O&M > Auto Triggered Task and run a smoke test on the workflow.

    3. In the test instance, verify that the result is as expected.

Limitations

  • Passing level: An assignment node can pass parameters only to its direct downstream nodes. It does not support passing parameters across multiple levels of nodes.

  • Size limit: The passed value can be a maximum of 2 MB. If the output of the assignment statement exceeds this limit, the assignment node fails to run.

  • Syntax restrictions:

    • Do not add comments to the code of an assignment node, as this can cause unexpected results.

    • MaxCompute SQL mode does not support the WITH clause.

Examples by language

The data format of the outputs parameter and the way downstream nodes reference it vary slightly depending on the language you use. The following examples use a Shell node as the downstream node to illustrate these differences.

Example 1: Pass a MaxCompute SQL query result

The assignment node passes the SQL query result to the downstream node as a two-dimensional array.

  • Upstream node (SQL assignment node) configuration

    Assume the following SQL code returns a result set of two rows and two columns:

    SELECT 'beijing', '1001'
    UNION ALL 
    SELECT 'hangzhou', '1002';
  • Downstream node (Shell node) configuration and output

    Add an input parameter named region to the Shell node and reference the outputs of the upstream SQL node.

    Write the following code to read the data:

    echo "Full result set: ${region}"
    echo "First row: ${region[0]}"
    echo "Second field of the first row: ${region[0][1]}"

    The code returns the following output:

    Full result set: beijing,1001
    hangzhou,1002
    First row: beijing,1001
    Second field of the first row: 1001

Example 2: Pass a Python 2 output result

The system splits the output of a Python 2 print statement by commas (,) and passes it to the downstream node as a one-dimensional array.

  • Upstream node (Python 2 assignment node) configuration

    The Python 2 code is as follows:

    print 'Electronics, Clothing, Books';
  • Downstream node (Shell node) configuration and output

    Add an input parameter named types to the Shell node and reference the outputs of the upstream assignment node.

    Write the following code to read the data:

    # Directly output the entire one-dimensional array.
    echo "Full result set: ${types}"
    
    # Output an element from the array by its index.
    echo "Second element: ${types[1]}"

    The code returns the following output:

    Full result set: Electronics,Clothing,Books
    Second element: Clothing
Note

The processing logic for a Shell node is similar to that for a Python 2 node and is therefore not repeated here.

Use case: Batch processing for multiple business lines

This example shows how to use an assignment node and a for-each node to batch process user behavior data for multiple lines of business. This enables automated data processing where a single set of logic serves multiple product lines.

Background

Assume you are a data engineer at a large company responsible for processing data from three core lines of business: e-commerce (ecom), finance (finance), and logistics (logistics), with the possibility of adding more in the future. You need to apply the same aggregation logic to the user behavior logs of these lines of business every day to calculate the daily page views (PV) for each user and store the results in a unified summary table.

  • Upstream source tables (DWD layer):

    • dwd_user_behavior_ecom_d: E-commerce user behavior table.

    • dwd_user_behavior_finance_d : Finance user behavior table.

    • dwd_user_behavior_logistics_d : Logistics user behavior table.

    • dwd_user_behavior_${line_of_business}_d : User behavior tables for other potential lines of business.

    • These tables have the same schema and are partitioned by day (dt).

  • Downstream destination table (DWS layer):

    • dws_user_summary_d : User summary table.

    • This table is partitioned by line of business (biz_line) and day (dt) to store the aggregated results from all lines of business.

Using a for-each node, you only need to maintain one set of processing logic, and the system automatically iterates through all lines of business to complete the calculations.

Data preparation

First, create the example tables and insert test data. This example uses the business date 20251010.

  1. Associate a MaxCompute compute resource with the workspace.

  2. Go to Data Studio and create a MaxCompute SQL node.

  3. Create the source tables (DWD layer). In the MaxCompute SQL node, add the following code, select it, and run it.

    -- E-commerce user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_ecom_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Event time as a millisecond-level Unix timestamp'
    ) 
    COMMENT 'E-commerce user behavior log detail table'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    INSERT OVERWRITE TABLE dwd_user_behavior_ecom_d PARTITION (dt='20251010') VALUES
    ('user001', 'click',        1760004060000), -- 2025-10-10 10:01:00.000
    ('user002', 'browse',       1760004150000), -- 2025-10-10 10:02:30.000
    ('user001', 'add_to_cart',  1760004300000); -- 2025-10-10 10:05:00.000
    -- Verify that the e-commerce user behavior table is created.
    SELECT * FROM dwd_user_behavior_ecom_d where dt='20251010';
    -- Finance user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_finance_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Event time as a millisecond-level Unix timestamp'
    ) 
    COMMENT 'Finance user behavior log detail table'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    INSERT OVERWRITE TABLE dwd_user_behavior_finance_d PARTITION (dt='20251010') VALUES
    ('user003', 'open_app',      1760020200000), -- 2025-10-10 14:30:00.000
    ('user003', 'transfer',      1760020215000), -- 2025-10-10 14:30:15.000
    ('user003', 'check_balance', 1760020245000), -- 2025-10-10 14:30:45.000
    ('user004', 'open_app',      1760020300000); -- 2025-10-10 14:31:40.000
    -- Verify that the finance user behavior table is created.
    SELECT * FROM dwd_user_behavior_finance_d where dt='20251010';
    -- Logistics user behavior table
    CREATE TABLE IF NOT EXISTS dwd_user_behavior_logistics_d (
        user_id     STRING COMMENT 'User ID',
        action_type STRING COMMENT 'Behavior type',
        event_time  BIGINT COMMENT 'Event time as a millisecond-level Unix timestamp'
    ) 
    COMMENT 'Logistics user behavior log detail table'
    PARTITIONED BY (dt STRING COMMENT 'Date partition in yyyymmdd format');
    INSERT OVERWRITE TABLE dwd_user_behavior_logistics_d PARTITION (dt='20251010') VALUES
    ('user001', 'check_status',    1760032800000), -- 2025-10-10 18:00:00.000
    ('user005', 'schedule_pickup', 1760032920000); -- 2025-10-10 18:02:00.000
    -- Verify that the logistics user behavior table is created.
    SELECT * FROM dwd_user_behavior_logistics_d where dt='20251010';
  4. Create the destination table (DWS layer). In the MaxCompute SQL node, add the following code, select it, and run it.

    CREATE TABLE IF NOT EXISTS dws_user_summary_d (
        user_id     STRING COMMENT 'User ID',
        pv          BIGINT COMMENT 'Daily page views'
    ) 
    COMMENT 'User daily page views summary table'
    PARTITIONED BY (
        dt           STRING COMMENT 'Date partition in yyyymmdd format',
        biz_line     STRING COMMENT 'Line of business partition, such as ecom, finance, or logistics'
    );
    Important

    If your workspace uses the standard development environment, you must deploy this node to the production environment and perform a data backfill.

Workflow implementation

  1. Create a workflow. In the Scheduling Parameters panel on the right, set the scheduling parameter bizdate to the previous day: $[yyyymmdd-1].

  2. In the workflow, create an assignment node named get_biz_list. Use the MaxCompute SQL language to write the following code. This node outputs a list of the lines of business to be processed.

    -- Output all lines of business to be processed.
    SELECT 'ecom' AS biz_line
    UNION ALL
    SELECT 'finance' AS biz_line
    UNION ALL
    SELECT 'logistics' AS biz_line;
  3. Configure the for-each node

    • Return to the workflow canvas and create a downstream for-each node for the get_biz_list assignment node.

    • Go to the for-each node settings panel. In the Scheduling panel on the right, under the Scheduling Parameters > Script Parameters tab, bind the loopDataArray parameter to the outputs of the get_biz_list node.

    • In the for-each node loop body, click Create Internal Node to create a MaxCompute SQL node, and then write the processing logic for the loop body.

      Note
      • This script is driven by the for-each node and runs once for each line of business.

      • At runtime, the system dynamically replaces the built-in variable ${dag.foreach.current} with the current line of business name. The expected iteration values are 'ecom', 'finance', and 'logistics'.

      SET odps.sql.allow.dynamic.partition=true;
      INSERT OVERWRITE TABLE dws_user_summary_d PARTITION (dt='${bizdate}', biz_line)
      SELECT
          user_id,
          COUNT(*) AS pv,
          '${dag.foreach.current}' AS biz_line
      FROM
          dwd_user_behavior_${dag.foreach.current}_d
      WHERE
          dt = '${bizdate}'
      GROUP BY
          user_id;
  4. Add a verification node

    Return to the workflow canvas. For the for-each node, click Create Downstream to create a MaxCompute SQL node, and then add the following code.

    SELECT * FROM dws_user_summary_d WHERE dt='20251010' ORDER BY biz_line, user_id;

Deployment and results

Deploy the workflow to the production environment. Go to Auto Triggered Task O&M > Auto Triggered Task O&M > Auto Triggered Task, find the workflow, and run a smoke test. Select '20251010' as the business date.

After the run completes, the final node produces the following output:

user_id

pv

dt

biz_line

user001

2

20251010

ecom

user002

1

20251010

ecom

user003

3

20251010

finance

user004

1

20251010

finance

user001

1

20251010

logistics

user005

1

20251010

logistics

Benefits

  • High scalability: To add a new line of business, you only need to add one line of SQL to the assignment node without modifying the processing logic.

  • Easy maintenance: All lines of business share the same processing logic. A change in one place applies to all.

FAQ

  • Q: In MaxCompute SQL mode, I get the error "find no select sql in sql assignment!". What should I do?

    A: The MaxCompute SQL statement is missing a SELECT statement. Add a SELECT statement. WITH statements are not supported and also cause this error.

  • Q: In Shell or Python mode, I get the error "OutPut Result is null, cannot handle!". What should I do?

    A: The script has no output. Check whether your code contains a print statement (print or echo).

  • Q: In Shell or Python mode, how do I handle output elements that contain commas?

    A: You need to escape the comma (,) by replacing it with \,. The following Python code shows an example:

    categories = ["Electronics", "Clothing, Shoes & Accessories"]
    # Escape commas in each element by replacing ',' with '\,'.
    escaped_categories = [cat.replace(",", "\,") for cat in categories]
    # Join the escaped elements with a comma.
    output_string = ",".join(escaped_categories)
    print output_string
    # The final string passed to the downstream node is:
    # Electronics,Clothing\, Shoes & Accessories
  • Q: Can a downstream node be configured to receive results from multiple upstream assignment nodes?

    A: Yes. You only need to assign the results of different nodes to different parameters.

    image

  • Q: Does the assignment node support other languages?

    A: The assignment node currently supports only MaxCompute SQL, Python 2, and Shell. Some nodes, such as EMR Hive, Hologres SQL, EMR Spark SQL, AnalyticDB for PostgreSQL, ClickHouse SQL, and MySQL nodes, have a built-in assignment parameter feature that serves the same purpose.

    In the Output Parameters section, click + Create Parameter.

Related reading