You can create an ODPS Script node to develop an SQL script by using the code editor provided by the MaxCompute V2.0 SQL engine.

Background information

An ODPS Script node allows DataWorks to compile an SQL script as a whole, instead of compiling the SQL statements in the script one by one. This way, the SQL script is committed and run as a whole. This ensures that an execution plan is queued and executed only once, making full use of MaxCompute computing resources.
ODPS Script nodes allow you to write SQL statements based on your business logic in a way similar to that of using a common programming language. You do not need to consider how to organize the SQL statements.
-- SET statements
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=***;]
[...]
-- Data definition language (DDL) statements
create table table1 xxx;
[create table table2 xxx;]
[...]
-- Data manipulation language (DML) statements
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table3
    [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM table4
    [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
    FROM @var1 join @var2 on ... ;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM @var3;
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1
    UNION ALL | UNION
    SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
    AS
    SELECT [ALL | DISTINCT] select_expr, select_expr, ...
    FROM var4;]
ODPS Script nodes have the following limits:
  • ODPS Script nodes support SET statements, DML statements, and some DDL statements. The DDL statements, such as DESC and SHOW statements, that are used to return data are not supported.
  • A complete script consists of SET statements, DDL statements, and DML statements in sequence. You can write one or more statements of each type, or skip a type without writing any statements of that type. However, you cannot mix different types of statements together. You must strictly follow this sequence: SET statements > DDL statements > DML statements.
  • The at signs (@) preceding several statements indicate that these statements are connected by using variables.
  • You can write only one statement, for example, a SELECT statement, that returns data in a script. If you write multiple such statements in a script, an error occurs. We recommend that you do not use SELECT statements in a script.
  • You can write only one CREATE TABLE AS statement in a script, and this statement must be the last statement in the script. We recommend that you write CREATE TABLE statements and INSERT statements in different sections to separate them.
  • If one statement in a script fails, the whole script fails.
  • A job is generated to process data only after all the input data is prepared for a script.
  • If you specify a statement for writing data to a table and then a statement for reading data from the table in the same script, an error occurs. For example, if you write the following statements in a script, an error occurs:
    insert overwrite table src2 select * from src where key > 0;
    @a := select * from src2;
    select * from @a;
    To avoid the error, write the statements in the following format:
    @a := select * from src where key > 0;
    insert overwrite table src2 select * from @a;
    select * from @a;
    Sample script:
    create table if not exists dest(key string , value bigint) partitioned by (d string);
    create table if not exists dest2(key string,value bigint ) partitioned by (d string);
    @a := select * from src where value >0;
    @b := select * from src2 where key is not null;
    @c := select * from src3 where value is not null;
    @d := select a.key,b.value from @a left outer join @b on a.key=b.key and b.value>0;
    @e := select a.key,c.value from @a inner join @c on a.key=c.key;
    @f := select * from @d union select * from @e union select * from @a;
    insert overwrite table dest partition (d='20171111') select * from @f;
    @g := select e.key,c.value  from @e join @c on e.key=c.key;
    insert overwrite table dest2 partition (d='20171111') SELECT * from @g;
    ODPS Script nodes are applicable to the following scenarios:
    • You can use an ODPS Script node to rewrite a single statement with nested subqueries, or a script that must be split into multiple statements to make it simpler.
    • Data from different data stores may be ready at different time points, and the time difference may be large. For example, the data from one data store can be ready at 01:00, whereas that from the other data store can be ready at 07:00. In this case, table variables are not suitable for connecting statements. You can use an ODPS Script node to combine the statements to a script.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. On the Data Development tab, move the pointer over the Create icon icon and choose MaxCompute > ODPS Script.
    Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose New > ODPS Script.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the node configuration tab, write the SQL script of the ODPS Script node. For more information, see Develop and submit an SQL script.
  6. On the node configuration tab, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Basic properties.
  7. Save and commit the node.
    Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
    1. Click the Save icon in the toolbar to save the node.
    2. Click the Commit icon in the toolbar.
    3. In the Commit Node dialog box, enter your comments in the Change description field.
    4. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  8. Test the node. For more information, see View auto triggered nodes.