You can create an ODPS Script node to develop an SQL script by using the code editor provided by the MaxCompute V2.0 SQL engine.
ODPS Script nodes allow you to write SQL statements based on your business logic in a way similar to that of using a common programming language. You do not need to consider how to organize the SQL statements.
-- SET statements set odps.sql.type.system.odps2=true; [set odps.stage.reducer.num=***;] [...] -- Data definition language (DDL) statements create table table1 xxx; [create table table2 xxx;] [...] -- Data manipulation language (DML) statements @var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table3 [WHERE where_condition]; @var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table4 [WHERE where_condition]; @var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ... FROM @var1 join @var2 on ... ; INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)] SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM @var3; [@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 UNION ALL | UNION SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2; CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name AS SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM var4;]
ODPS Script nodes have the following limits:
- ODPS Script nodes support SET statements, DML statements, and some DDL statements.
The DDL statements, such as
SHOWstatements, that are used to return data are not supported.
- A complete script consists of SET statements, DDL statements, and DML statements in sequence. You can write one or more statements of each type, or skip a type without writing any statements of that type. However, you cannot mix different types of statements together. You must strictly follow this sequence: SET statements > DDL statements > DML statements.
- The at signs (@) preceding several statements indicate that these statements are connected by using variables.
- You can write only one statement, for example, a SELECT statement, that returns data in a script. If you write multiple such statements in a script, an error occurs. We recommend that you do not use SELECT statements in a script.
- You can write only one
CREATE TABLE ASstatement in a script, and this statement must be the last statement in the script. We recommend that you write CREATE TABLE statements and INSERT statements in different sections to separate them.
- If one statement in a script fails, the whole script fails.
- A job is generated to process data only after all the input data is prepared for a script.
- If you specify a statement for writing data to a table and then a statement for reading
data from the table in the same script, an error occurs. For example, if you write
the following statements in a script, an error occurs:
insert overwrite table src2 select * from src where key > 0; @a := select * from src2; select * from @a;To avoid the error, write the statements in the following format:
@a := select * from src where key > 0; insert overwrite table src2 select * from @a; select * from @a;Sample script:
create table if not exists dest(key string , value bigint) partitioned by (d string); create table if not exists dest2(key string,value bigint ) partitioned by (d string); @a := select * from src where value >0; @b := select * from src2 where key is not null; @c := select * from src3 where value is not null; @d := select a.key,b.value from @a left outer join @b on a.key=b.key and b.value>0; @e := select a.key,c.value from @a inner join @c on a.key=c.key; @f := select * from @d union select * from @e union select * from @a; insert overwrite table dest partition (d='20171111') select * from @f; @g := select e.key,c.value from @e join @c on e.key=c.key; insert overwrite table dest2 partition (d='20171111') SELECT * from @g;ODPS Script nodes are applicable to the following scenarios:
- You can use an ODPS Script node to rewrite a single statement with nested subqueries, or a script that must be split into multiple statements to make it simpler.
- Data from different data stores may be ready at different time points, and the time difference may be large. For example, the data from one data store can be ready at 01:00, whereas that from the other data store can be ready at 07:00. In this case, table variables are not suitable for connecting statements. You can use an ODPS Script node to combine the statements to a script.
- Go to the DataStudio page.
- Log on to the DataWorks console.
- In the left-side navigation pane, click Workspaces.
- In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
- On the Data Development tab, move the pointer over the icon and choose . Alternatively, you can click a workflow in the Business process section, right-click MaxCompute, and then choose .
- In the Create Node dialog box, set the Node Name and Location parameters.Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Commit.
- On the node configuration tab, write the SQL script of the ODPS Script node. For more information, see Develop and submit an SQL script.
- On the node configuration tab, click the Scheduling configuration tab in the right-side navigation pane. On the Scheduling configuration tab, set the scheduling properties for the node. For more information, see Basic properties.
- Save and commit the node.Notice You must set the Rerun and Parent Nodes parameters before you can commit the node.
In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
- Click the icon in the toolbar to save the node.
- Click the icon in the toolbar.
- In the Commit Node dialog box, enter your comments in the Change description field.
- Click OK.
- Test the node. For more information, see View auto triggered nodes.