You can create an ODPS Script node to develop an SQL script by using the code editor provided by the MaxCompute V2.0 SQL engine.
Background information
ODPS Script nodes allow you to write SQL statements based on your business logic in
a way similar to that of using a common programming language. You do not need to consider
how to organize the SQL statements.
-- SET statements
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=***;]
[...]
-- Data definition language (DDL) statements
create table table1 xxx;
[create table table2 xxx;]
[...]
-- Data manipulation language (DML) statements
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table3
[WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table4
[WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
FROM @var1 join @var2 on ... ;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM @var3;
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1
UNION ALL | UNION
SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
AS
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM var4;]
ODPS Script nodes have the following limits:
- ODPS Script nodes support SET statements, DML statements, and some DDL statements.
The DDL statements, such as
DESC
andSHOW
statements, that are used to return data are not supported. - A complete script consists of SET statements, DDL statements, and DML statements in sequence. You can write one or more statements of each type, or skip a type without writing any statements of that type. However, you cannot mix different types of statements together. You must strictly follow this sequence: SET statements > DDL statements > DML statements.
- The at signs (@) preceding several statements indicate that these statements are connected by using variables.
- You can write only one statement, for example, a SELECT statement, that returns data in a script. If you write multiple such statements in a script, an error occurs. We recommend that you do not use SELECT statements in a script.
- You can write only one
CREATE TABLE AS
statement in a script, and this statement must be the last statement in the script. We recommend that you write CREATE TABLE statements and INSERT statements in different sections to separate them. - If one statement in a script fails, the whole script fails.
- A job is generated to process data only after all the input data is prepared for a script.
- If you specify a statement for writing data to a table and then a statement for reading
data from the table in the same script, an error occurs. For example, if you write
the following statements in a script, an error occurs:
insert overwrite table src2 select * from src where key > 0; @a := select * from src2; select * from @a;
To avoid the error, write the statements in the following format:@a := select * from src where key > 0; insert overwrite table src2 select * from @a; select * from @a;
Sample script:create table if not exists dest(key string , value bigint) partitioned by (d string); create table if not exists dest2(key string,value bigint ) partitioned by (d string); @a := select * from src where value >0; @b := select * from src2 where key is not null; @c := select * from src3 where value is not null; @d := select a.key,b.value from @a left outer join @b on a.key=b.key and b.value>0; @e := select a.key,c.value from @a inner join @c on a.key=c.key; @f := select * from @d union select * from @e union select * from @a; insert overwrite table dest partition (d='20171111') select * from @f; @g := select e.key,c.value from @e join @c on e.key=c.key; insert overwrite table dest2 partition (d='20171111') SELECT * from @g;
ODPS Script nodes are applicable to the following scenarios:- You can use an ODPS Script node to rewrite a single statement with nested subqueries, or a script that must be split into multiple statements to make it simpler.
- Data from different data stores may be ready at different time points, and the time difference may be large. For example, the data from one data store can be ready at 01:00, whereas that from the other data store can be ready at 07:00. In this case, table variables are not suitable for connecting statements. You can use an ODPS Script node to combine the statements to a script.