Use DataWorks to develop a MaxCompute script task - MaxCompute

DataWorks provides ODPS Script nodes for developing MaxCompute script tasks in the code editor. This topic describes how to use an ODPS Script node to write and run a MaxCompute script task.

Prerequisites

Before you begin, ensure that you have:

An ODPS Script node created. For more information, see Create and manage ODPS nodes

How it works

MaxCompute's SQL engine supports script mode, which lets you write multiple SQL statements in a single script file and compile them as a whole. Each statement does not need to be compiled separately.

After compilation, the script is submitted to MaxCompute, which generates a unified execution plan. All statements run in one queue and are executed once — letting the script fully use MaxCompute's compute resources. For more information about script mode SQL, see SQL in script mode.

In DataWorks, ODPS Script nodes let you develop code for MaxCompute script tasks in script mode and integrate those tasks with other tasks in your workflow.

Use cases

Script mode is well-suited for these scenarios:

Rewriting nested subqueries or splitting complex scripts: A single statement with deeply nested subqueries needs to be flattened into multiple connected statements, or a complex script must be split into multiple statements.
Scalar variable computation: Assign a constant to a variable, convert it to a scalar value with SELECT * FROM @variable, and use that value alongside other columns in calculations. For example:
```
@a := SELECT 10; -- Assign the constant 10 to @a. You can also assign a single-row value from a table: SELECT col1 FROM t1.
@b := SELECT key, value + (SELECT * FROM @a) FROM t2 WHERE key > 10000;
SELECT * FROM @b;
```
For more information about the SELECT * FROM @variable syntax, see Subqueries.

Script mode is not suitable when multiple data sources have large preparation time gaps between them. For example, if one data source is ready at 01:00 and another at 07:00, table variables are not a good fit for packaging such statements together, because a script job starts only after all input data is ready.

Syntax

A complete MaxCompute script has the following structure. Include zero or more statements of each type, but do not mix statement types within the same section.

-- SET statements (zero or more)
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=***;]
[...]

-- DDL statements (zero or more)
create table table1 xxx;
[create table table2 xxx;]
[...]

-- DML statements (zero or more)
-- @var holds a result set, not a scalar value.
-- To use a result set as a scalar, apply SELECT * FROM @var in a scalar subquery context.
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table3
    [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table4
    [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
    FROM @var1 JOIN @var2 ON ...;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM @var3;
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1
    UNION ALL | UNION
    SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    AS
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM var4;]

-- IF statement (optional, for conditional branching)
IF (<condition>) BEGIN
  ...
END ELSE BEGIN
  ...
END

Variable semantics: Each @var holds a result set, not a scalar value. To use a result set as a scalar, apply SELECT * FROM @var in a scalar subquery context.

Limitations

Keep the following constraints in mind when writing ODPS Script node code:

Supported statement types: SET statements, DML statements, and a subset of DDL statements. DDL statements that return display output — such as DESC and SHOW — are not supported.
One display-output statement per script: A script may contain only one statement that returns visible output (such as a SELECT statement). Including more than one causes an error. Avoid placing result-displaying SELECT statements in production scripts.
One CREATE TABLE AS statement per script, placed last: Only one CREATE TABLE AS statement is allowed per script, and it must be the final statement. Write CREATE TABLE and INSERT statements in separate sections to keep them distinct.
All-or-nothing execution: If any statement in a script fails, the entire script fails.
All input data must be ready before a job starts: A job is generated only after all input data for the script is available. Design your dependencies accordingly.

No write-then-read on the same table within a script: Because all statements compile into a single execution plan, writing to a table and then reading from that same table in the same script causes an error. For example, the following script fails:

-- Fails: writes to src2, then tries to read from src2
INSERT OVERWRITE TABLE src2 SELECT * FROM src WHERE key > 0;
@a := SELECT * FROM src2;
SELECT * FROM @a;

Rewrite it using table variables to avoid the conflict:

-- Works: reads from src, stores in @a, writes @a to src2, and also reads from @a
@a := SELECT * FROM src WHERE key > 0;
INSERT OVERWRITE TABLE src2 SELECT * FROM @a;
SELECT * FROM @a;

Write a script

ODPS Script nodes let you write SQL in a style similar to a general-purpose programming language. Use @var variables to connect statements and control data flow, rather than relying on intermediate tables.

The following script creates two destination tables from three source tables, using joins and a union to combine results:

CREATE TABLE IF NOT EXISTS dest (key STRING, value BIGINT);
CREATE TABLE IF NOT EXISTS dest2 (key STRING, value BIGINT);

@a := SELECT * FROM src WHERE value > 0;
@b := SELECT * FROM src2 WHERE key IS NOT NULL;
@c := SELECT * FROM src3 WHERE value IS NOT NULL;

@d := SELECT a.key, b.value FROM @a LEFT OUTER JOIN @b ON a.key = b.key AND b.value > 0;
@e := SELECT a.key, c.value FROM @a INNER JOIN @c ON a.key = c.key;
@f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a;

INSERT OVERWRITE TABLE dest SELECT * FROM @f;

@g := SELECT e.key, c.value FROM @e JOIN @c ON e.key = c.key;
INSERT OVERWRITE TABLE dest2 SELECT * FROM @g;

Note

This example uses tables src, src2, and src3. Use the following statements to create and populate them:

CREATE TABLE IF NOT EXISTS src (key STRING, value BIGINT);
INSERT INTO src VALUES ('1', 11);
INSERT INTO src VALUES ('1', 11);

CREATE TABLE IF NOT EXISTS src2 (key STRING, value BIGINT);
INSERT INTO src2 VALUES ('1', 22);
INSERT INTO src2 VALUES ('2', 22);

CREATE TABLE IF NOT EXISTS src3 (key STRING, value BIGINT);
INSERT INTO src3 VALUES ('1', 33);
INSERT INTO src3 VALUES ('3', 33);

Use IF statements

Script mode supports IF statements for conditional branching. The branch that runs depends on the condition type.

Branch on a Boolean expression (evaluated at compile time)

Use a Boolean expression when the branch decision is known before job execution — for example, when the condition depends on a scheduling parameter.

Use case: Read from different source tables based on whether a date parameter is odd or even.

set odps.sql.allow.fullscan=true;
set odps.optimizer.cbo.rule.filter.black=LM;

@date := '${var}';
@row TABLE(key INT, value BIGINT); -- Declare @row as a table variable.

IF (CAST(@date AS BIGINT) % 2 == 0) BEGIN
  @row := SELECT key, value FROM src1;
END ELSE BEGIN
  @row := SELECT key, value FROM src2;
END

INSERT OVERWRITE TABLE dest1 PARTITION(p='${var}') SELECT key, value FROM @row;

Note

The var variable is referenced in the script as ${var}. Assign its value in the Scheduling Parameter section of the Properties tab before running the node.

Branch on a scalar subquery (evaluated at run time)

Use a Boolean-type scalar subquery when the branch decision depends on actual data — for example, when the condition queries a count of rows. Because the condition is evaluated during job execution rather than compilation, MaxCompute submits multiple jobs.

Use case: Set a multiplier based on whether a condition in the source data is met.

@i BIGINT;
@t TABLE(id BIGINT, value BIGINT);

IF ((SELECT COUNT(*) FROM src WHERE a = '5') > 1) BEGIN
  @i := 1;
  @t := SELECT @i, @i * 2;
END ELSE BEGIN
  @i := 2;
  @t := SELECT @i, @i * 2;
END

SELECT id, value FROM @t;

Use embedded user-defined functions (UDFs)

Script mode supports embedding Java or Python code directly in an SQL script for inline UDF (user-defined function) development. For more information, see Code-embedded UDFs.

What's next

After completing your script development, continue with the following steps:

Configure scheduling properties: Set up periodic scheduling for the node, including rerun settings and scheduling dependencies. For more information, see Overview.
Debug the node: Test the code to verify the logic behaves as expected. For more information, see Debugging procedure.
Deploy the node: After development is complete, deploy the node so that MaxCompute runs it on the configured schedule. For more information, see Deploy nodes.