You can create an ODPS Script node to develop an SQL script by using the code editor provided by the MaxCompute V2.0 SQL engine.

Background information

An ODPS Script node allows DataWorks to compile an SQL script as a whole, instead of compiling the SQL statements in the script one by one. This way, the SQL script is committed and run as a whole. This ensures that an execution plan is queued and executed only once, making full use of MaxCompute computing resources.
ODPS Script nodes allow you to write SQL statements based on your business logic in a way similar to that of using a common programming language. You do not need to consider how to organize the SQL statements.
-- SET statements
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=***;]
[...]
-- Data definition language (DDL) statements
create table table1 xxx;
[create table table2 xxx;]
[...]
-- Data manipulation language (DML) statements
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table3
    [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table4
    [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
    FROM @var1 join @var2 on ... ;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM @var3;
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1
    UNION ALL | UNION
    SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    AS
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM var4;]
ODPS Script nodes have the following limits:
  • ODPS Script nodes support SET statements, DML statements, and some DDL statements. The DDL statements, such as DESC and SHOW statements, that are used to return data are not supported.
  • A complete script consists of SET statements, DDL statements, and DML statements in sequence. You can write one or more statements of each type, or skip a type without writing any statements of that type. However, you cannot mix different types of statements together. You must strictly follow this sequence: SET statements > DDL statements > DML statements.
  • The at signs (@) preceding several statements indicate that these statements are connected by using variables.
  • You can write only one statement, for example, a SELECT statement, that returns data in a script. If you write multiple such statements in a script, an error occurs. We recommend that you do not use SELECT statements in a script.
  • You can write only one CREATE TABLE AS statement in a script, and this statement must be the last statement in the script. We recommend that you write CREATE TABLE statements and INSERT statements in different sections to separate them.
  • If one statement in a script fails, the whole script fails.
  • A job is generated to process data only after all the input data is prepared for a script.
  • If you specify a statement for writing data to a table and then a statement for reading data from the table in the same script, an error occurs. For example, if you write the following statements in a script, an error occurs:
    insert overwrite table src2 select * from src where key > 0;
    @a := select * from src2;
    select * from @a;
    To avoid the error, write the statements in the following format:
    @a := select * from src where key > 0;
    insert overwrite table src2 select * from @a;
    select * from @a;
    Sample script:
    create table if not exists dest(key string , value bigint) partitioned by (d string);
    create table if not exists dest2(key string,value bigint ) partitioned by (d string);
    @a := select * from src where value >0;
    @b := select * from src2 where key is not null;
    @c := select * from src3 where value is not null;
    @d := select a.key,b.value from @a left outer join @b on a.key=b.key and b.value>0;
    @e := select a.key,c.value from @a inner join @c on a.key=c.key;
    @f := select * from @d union select * from @e union select * from @a;
    insert overwrite table dest partition (d='20171111') select * from @f;
    @g := select e.key,c.value  from @e join @c on e.key=c.key;
    insert overwrite table dest2 partition (d='20171111') SELECT * from @g;
    ODPS Script nodes are applicable to the following scenarios:
    • You can use an ODPS Script node to rewrite a single statement with nested subqueries, or a script that must be split into multiple statements to make it simpler.
    • Data from different data stores may be ready at different time points, and the time difference may be large. For example, the data from one data store can be ready at 01:00, whereas that from the other data store can be ready at 07:00. In this case, table variables are not suitable for connecting statements. You can use an ODPS Script node to combine the statements to a script.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where the workspace resides. On the Workspaces page, find the workspace in which you want to create tables, and click DataStudio in the Actions column.
  2. On the Data Development tab, move the pointer over the Create icon icon and choose Create Node > MaxCompute > ODPS Script.
    Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose Create Node > ODPS Script.
  3. In the Create Node dialog box, configure the Name and Path parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Confirm.
  5. On the node configuration tab, write the SQL script of the ODPS Script node. For more information, see Develop and submit an SQL script.
  6. On the node configuration tab, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Configure basic properties.
  7. Save and commit the node.
    Important You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
    1. Click the Save icon in the top toolbar to save the node.
    2. Click the Submit icon in the toolbar.
    3. In the Commit Node dialog box, configure the Change description parameter.
    4. Click OK.
    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Deploy nodes.
  8. Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.