You can create an ODPS Script node to develop an SQL script by using the code editor provided by MaxCompute based on the SQL engine V2.0.

The ODPS Script node allows DataWorks to compile the SQL script as a whole, instead of compiling the SQL statements in the script one by one. In this way, the SQL script is committed and run as a whole. This guarantees that an execution plan is only queued and executed once, making full use of MaxCompute computing resources.

Create an ODPS Script node

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and choose MaxCompute > ODPS Script.

    You can also find the target workflow, right-click MaxCompute, and choose Create > ODPS Script.

  3. In the Create Node dialog box that appears, enter the node name, select the target folder, and click Commit.
    Note A node name can be up to 128 characters in length.
  4. Edit the SQL script of the ODPS Script node.

    You can edit the SQL script of the ODPS Script node in the code editor. For more information, see Write SQL.

    After you edit the code of the ODPS Script node, you can save the code and commit the node. For more information, see ODPS Script node configuration tab.

  5. Configure the node properties.

    Click the Properties tab in the right-side navigation pane. On the Properties tab that appears, set the relevant parameters. For more information, see Properties.

  6. Commit the node.

    After the node properties are configured, click the Save icon in the upper-left corner. Then, commit or commit and unlock the node to the development environment.

  7. Deploy the node.

    For more information, see Deploy a node.

  8. Test the node in the production environment.

SQL syntax and limits for ODPS Script nodes

Write SQL statements based on your business logic in a way similar to that of using a common programming language. You do not need to consider how to organize the SQL statements.
-- SET statements
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=***;]
[...]
-- Data definition language (DDL) statements
create table table1 xxx;
[create table table2 xxx;]
[...]
-- Data manipulation language (DML) statements
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table3
    [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table4
    [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
    FROM @var1 join @var2 on ... ;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM @var3;
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1
    UNION ALL | UNION
    SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    AS
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM var4;]
Note
  • ODPS Script nodes support SET statements, DML statements, and some DDL statements. The DDL statements used to return data, such as DESC and SHOW statements, are not supported.
  • A complete script consists of SET statements, DDL statements, and DML statements in sequence. You can write one or more statements of each type, or even skip a type without writing any statements of that type. However, you cannot mix different types of statements together. You must strictly follow the sequence of SET statements > DDL statements > DML statements.
  • The at signs (@) residing before some statements indicate that these statements are connected through variables.
  • A script supports only one statement that returns data, such as an independent SELECT statement. If multiple such statements are provided, an error occurs. We do not recommend that you use SELECT statements in a script.
  • A script supports only one CREATE TABLE AS statement, which must be the last statement. We recommend that you put CREATE TABLE statements and INSERT statements in different sections to separate them.
  • If one statement in a script fails, the whole script fails.
  • A job is generated to process data only after all the input data is prepared for a script.
  • If a script writes data to a table and then reads the table, an error occurs. For example, an error occurs for the following statements:
    insert overwrite table src2 select * from src where key > 0;
    @a := select * from src2;
    select * from @a;
    To avoid the error, edit the statements as follows:
    @a := select * from src where key > 0;
    insert overwrite table src2 select * from @a;
    select * from @a;
Sample script:
create table if not exists dest(key string , value bigint) partitioned by (d string);
create table if not exists dest2(key string,value bigint) partitioned by (d string);
@a := select * from src where value >0;
@b := select * from src2 where key is not null;
@c := select * from src3 where value is not null;
@d := select a.key,b.value from @a left outer join @b on a.key=b.key and b.value>0;
@e := select a.key,c.value from @a inner join @c on a.key=c.key;
@f := select * from @d union select * from @e union select * from @a;
insert overwrite table dest partition (d='20171111') select * from @f;
@g := select e.key,c.value  from @e join @c on e.key=c.key;
insert overwrite table dest2 partition (d='20171111') SELECT * from @g;

Application scenarios of ODPS Script node

  • You can use an ODPS Script node to rewrite a single statement with nested subqueries, or a script that must be split into multiple statements due to its complexity.
  • Data from different data stores may be prepared at different time points, and the time difference may be large. For example, the data from one data store can be prepared at 01:00, whereas that from the other data store can be prepared at 07:00. In this case, table variables are not suitable for connecting statements. You can use an ODPS Script node to combine the statements to a script.